CN109740075B - Event correlation calculation method, device, equipment and storage medium - Google Patents

Event correlation calculation method, device, equipment and storage medium Download PDF

Info

Publication number
CN109740075B
CN109740075B CN201811528235.2A CN201811528235A CN109740075B CN 109740075 B CN109740075 B CN 109740075B CN 201811528235 A CN201811528235 A CN 201811528235A CN 109740075 B CN109740075 B CN 109740075B
Authority
CN
China
Prior art keywords
event
link
search keyword
user behavior
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811528235.2A
Other languages
Chinese (zh)
Other versions
CN109740075A (en
Inventor
周辉
陈文浩
陈玉光
郑宇宏
陈伟娜
韩翠云
潘禄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811528235.2A priority Critical patent/CN109740075B/en
Publication of CN109740075A publication Critical patent/CN109740075A/en
Application granted granted Critical
Publication of CN109740075B publication Critical patent/CN109740075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for calculating event correlation. According to the embodiment of the invention, the calculation precision of the event correlation degree is improved by acquiring the user behavior log, calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log in the event library, or calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log, and calculating the correlation degree between the events through the search and click behaviors of different users.

Description

Event correlation calculation method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for calculating event relevancy.
Background
The event is the objective fact that specific people, objects and things interact with each other at specific time and specific places, and the occurrence of the event has the characteristics of objectivity, authenticity and the like. Event relevancy refers to the strength of correlation between events.
In the prior art, a method for calculating event correlation includes: a method for calculating event correlation through text correlation and a method for calculating event correlation through event key attributes. However, the accuracy of the event correlation calculated by the prior art is low.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for calculating event correlation degree, which are used for improving the calculation precision of the event correlation degree.
In a first aspect, an embodiment of the present invention provides an event correlation calculation method, including:
acquiring a user behavior log, wherein the user behavior log comprises a plurality of record information, each record information in the plurality of record information corresponds to one search behavior of a user, and the record information comprises at least one search keyword and at least one link clicked by the user;
determining a plurality of search keywords corresponding to a first event according to a plurality of links corresponding to the first event in an event library and the user behavior log, wherein the first event corresponds to a plurality of first information, the plurality of links corresponding to the first event correspond to the plurality of first information one by one, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one first information in the plurality of first information;
determining a plurality of search keywords corresponding to a second event according to a plurality of links corresponding to the second event in the event library and the user behavior log, wherein the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used for searching and clicking at least one second information in the plurality of second information;
calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log; or calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log.
In a second aspect, an embodiment of the present invention provides an event correlation calculation apparatus, including:
the system comprises an acquisition module, a search module and a display module, wherein the acquisition module is used for acquiring a user behavior log, the user behavior log comprises a plurality of pieces of recorded information, each piece of recorded information in the plurality of pieces of recorded information corresponds to one-time search behavior of one user, and the recorded information comprises at least one search keyword and at least one link clicked by the user;
the first determining module is used for determining a plurality of search keywords corresponding to a first event according to a plurality of links corresponding to the first event in an event library and the user behavior log, wherein the first event corresponds to a plurality of pieces of first information, the plurality of links corresponding to the first event correspond to the plurality of pieces of first information one by one, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one piece of first information in the plurality of pieces of first information;
a second determining module, configured to determine, according to a plurality of links corresponding to a second event in the event library and the user behavior log, a plurality of search keywords corresponding to the second event, where the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used to search for and click at least one second information in the plurality of second information;
the first calculation module is used for calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log; alternatively, the first and second electrodes may be,
and the second calculating module is used for calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log.
In a third aspect, an embodiment of the present invention provides an apparatus, including:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method of the first aspect.
According to the event correlation degree calculating method, the event correlation degree calculating device, the event correlation degree calculating equipment and the storage medium, the user behavior logs are obtained, the correlation degree of the first event and the second event is calculated according to the links corresponding to the first event, the links corresponding to the second event and the user behavior logs in the event library, or the correlation degree of the first event and the second event is calculated according to the search keywords corresponding to the first event, the search keywords corresponding to the second event and the user behavior logs, the correlation degree between the events is calculated through searching and clicking behaviors of different users, and the calculation accuracy of the event correlation degree is improved.
Drawings
Fig. 1 is a flowchart of an event relevancy calculation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of events provided by another embodiment of the present invention;
FIG. 3 is a flowchart of a method for calculating event relevancy according to another embodiment of the present invention;
FIG. 4 is a flowchart of a method for calculating event relevancy according to another embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an event correlation calculation apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an event correlation calculation apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus provided in an embodiment of the present invention.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the embodiment of the invention, the event (event) is an objective fact that a specific person, an object and an event interact with each other at a specific time and a specific place, and the occurrence of the event has the characteristics of objectivity, authenticity and the like. The relevancy of an event means that a certain association exists between two events, and the strength of the relevancy can be measured by a numerical value. The embodiment of the invention calculates the correlation degree between the events according to the user behaviors, wherein the user behaviors comprise a user searching behavior and a clicking behavior. The search behavior may specifically be that a user inputs a search keyword in a search engine, and the search engine searches a plurality of search results related to the search keyword according to the search keyword. The click behavior may specifically be that the user clicks at least one search result of the plurality of search results searched by the search engine.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an event correlation calculation method according to an embodiment of the present invention. The embodiment of the invention provides an event correlation degree calculation method aiming at the technical problems in the prior art, and the method comprises the following specific steps:
step 101, obtaining a user behavior log, wherein the user behavior log comprises a plurality of record information, each record information in the plurality of record information corresponds to one search behavior of a user, and the record information comprises at least one search keyword and at least one link clicked by the user.
In this embodiment, a user inputs a search keyword in a search engine, the search engine searches a plurality of search results related to the search keyword according to the search keyword, and the user clicks at least one of the plurality of search results searched by the search engine. The search engine generates a user behavior log according to the search keywords input by the user and the click behaviors of the user on the search results, so that the user behavior log can be derived from massive search keywords recorded by the search engine and links of the search results clicked by the user. In this embodiment, the user behavior log may include a plurality of record information, each of which corresponds to a search behavior of one user, that is, one record information is generated according to a search behavior of one user, and the record information includes at least one search keyword and at least one link clicked by the user. In this embodiment, the search keyword is recorded as query, and the link is Uniform Resource Locator (URL).
For example, at time t1, when the user 1 inputs query1 into the search engine, the search engine searches for a plurality of search results, the user clicks one of the search results, and the URL corresponding to the search result clicked by the user is denoted as URL1, the user 1 inputs query1 and clicks URL1, which can be denoted as one search behavior of the user 1, and the record information corresponding to the search behavior includes query1 and URL1, which are denoted as (query1, URL 1).
In other embodiments, if the user inputs another query within a preset time after inputting one query in the search engine, the two queries input by the user and the corresponding click behavior are recorded as a search behavior of the user. For example, at time t1, user 1 inputs query1 in the search engine, and the search engine searches a plurality of search results, and the URL of the search result clicked by the user is URL 1. Within a preset time after the time t1, for example, within one minute, the user 1 inputs query2 into the search engine, the search engine searches again for a plurality of search results, if the URL of the search result clicked by the user is URL2, the user 1 inputs query1 and clicks URL1, inputs query2 and clicks URL2 to record a search behavior of the user 1, and the record information corresponding to the search behavior is recorded as (query1, URL1, query2, URL 2).
If at time t1, user 1 enters query1 in the search engine, and the search engine searches a plurality of search results without clicking any search result. Within a preset time after the time t1, for example, within one minute, the user 1 inputs query2 into the search engine, the search engine searches again for a plurality of search results, if the URL of the search result clicked by the user is URL2, the user 1 inputs query1, inputs query2 and clicks URL2 to record a search behavior of the user 1, and the record information corresponding to the search behavior is recorded as (query1, query2, URL 2).
In other embodiments, if the user inputs different queries for multiple times within the preset time, for example, within one minute, the multiple queries and the corresponding click behavior input by the user are recorded as a search behavior of the user. In addition, since the user behavior log includes a plurality of record information, each record information may also correspond to one piece of identification information, for example, if each record information is denoted as a Session and the identification information of the record information is denoted as a Session _ id, the Session may be denoted as (Session _ id, query1, URL1, query2, URL2, … …).
Optionally, the user behavior log is counted by taking the day as a time unit, and the user behavior log may include record information corresponding to at least one search behavior of each of a plurality of different users.
Further, from each record information in the user behavior log, a search keyword pair and a link pair in each record information may be determined, the search keyword pair comprising two different search keywords, and the link pair comprising two different links. Further, the number of times of occurrence of the search keyword pair in the user behavior log, that is, the number of times of co-occurrence of the search keyword pair, is counted. And counting the number of times of the link pair appearing in the user behavior log, namely the number of co-occurrences of the link pair.
For example, the user behavior log includes the Session of user 1 and the Session of user 2, where the Session of user 1 is denoted by (Session _1, query1, URL1, query2, URL2), and the Session of user 2 is denoted by (Session _2, query1, URL2, query2, URL 3). The search keyword pair in the Session of user 1 is (query1, query2) and the link pair is (URL1, URL 2). The search keyword pair in the Session of user 2 is (query1, query2) and the link pair is (URL2, URL 3). The number of co-occurrences of the search keyword pair (query1, query2) is 2, the number of co-occurrences of the link pair (URL1, URL2) is 1, and the number of co-occurrences of the link pair (URL2, URL3) is 1.
For another example, the user behavior log includes the Session of user 1 and the Session of user 2, where the Session of user 1 is represented by (Session _1, query1, URL1, query2, URL2), and the Session of user 2 is represented by (Session _2, query1, URL2, query2, URL3, query3, URL 1). The search keyword pair in the Session of user 1 is (query1, query2) and the link pair is (URL1, URL 2). The search keyword pairs in the Session of user 2 are (query1, query2), (query1, query3), (query2, and query3), and the link pairs in the Session of user 2 are (URL1, URL2), (URL1, URL3), (URL2, and URL 3). The number of co-occurrences of the search keyword pair (query1, query2) is 2, the number of co-occurrences of the link pair (URL1, URL2) is 2, the number of co-occurrences of the search keyword pair (query1, query3) is 1, the number of co-occurrences of the search keyword pair (query2, query3) is 1, the number of co-occurrences of the link pair (URL1, URL3) is 1, and the number of co-occurrences of the link pair (URL2, URL3) is 1. According to the method, different search keyword pairs and different link pairs in the user behavior logs in the whole day can be counted, and the co-occurrence number of each search keyword pair and the co-occurrence number of each link pair can be counted. Further, each search keyword pair, each link pair, the number of co-occurrences of each search keyword pair, the number of co-occurrences of each link pair, and the date are stored in the database.
Step 102, determining a plurality of search keywords corresponding to a first event in an event library according to a plurality of links corresponding to the first event and the user behavior log, wherein the first event corresponds to a plurality of first information, the plurality of links corresponding to the first event correspond to the plurality of first information one to one, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one first information in the plurality of first information.
In this embodiment, a large amount of information is aggregated, attribute extracted and optimized to form an event library, where the event library includes a plurality of events, and an event can be described by an information cluster, i.e. a plurality of information, that is, an event can be described by a plurality of information, and the information can be news information. For example, the correlation between any two events in the event library is calculated, one event is denoted as a first event, the other event is denoted as a second event, the information describing the first event is denoted as first information, and the information describing the second event is denoted as second information.
For example, the first event may be described by information 1, information 2, and information 3, the URL of information 1 is denoted as URL1, the URL of information 2 is denoted as URL2, and the URL of information 3 is denoted as URL3, then the plurality of links corresponding to the first event are URL1, URL2, and URL3, and optionally, URL1, URL2, and URL3 form a first link list, which is denoted as { URL1, URL2, URL3}, and the first link list is the link list corresponding to the first event. Further, according to a link list and a user behavior log corresponding to a first event, a plurality of search keywords corresponding to the first event are determined, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one piece of first information in the plurality of pieces of first information.
Optionally, the determining, according to the plurality of links corresponding to the first event in the event library and the user behavior log, a plurality of search keywords corresponding to the first event includes: according to each link in a plurality of links corresponding to the first event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log; and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the first event.
For example, the user behavior log includes the Session of user 1, the Session of user 2, and the Session of user 3. Here, the Session of user 1 is represented by (Session _1, query1, URL1, query2, URL2), the Session of user 2 is represented by (Session _2, query1, URL2, query2, URL3, query3, URL1), and the Session of user 3 is represented by (Session _3, query2, URL2, query3, URL3, query4, URL 4). The co-occurrence frequency of each search keyword pair and each search keyword pair in the user behavior log is specifically shown in table 1 below, and the co-occurrence frequency of each link pair and each link pair in the user behavior log is specifically shown in table 2 below:
TABLE 1
Search keyword pairs Number of co-occurrences
(query1,query2) 2
(query1,query3) 1
(query2,query3) 2
(query2,query4) 1
(query3,query4) 1
TABLE 2
Link pair Number of co-occurrences
(URL1,URL2) 2
(URL1,URL3) 1
(URL2,URL3) 2
(URL2,URL4) 1
(URL3,URL4) 1
And according to the plurality of links corresponding to the first event, namely each link in the URL1, the URL2 and the URL3, acquiring at least one search keyword corresponding to the link from the user behavior log. For example, in the user behavior log, at least one search keyword corresponding to the URL1 is query1 and query3, at least one search keyword corresponding to the URL2 is query1 and query2, and at least one search keyword corresponding to the URL3 is query2 and query 3. At least one search keyword corresponding to each link in the URL1, the URL2 and the URL3 is determined as a plurality of search keywords corresponding to the first event, namely, the plurality of search keywords corresponding to the first event are a set of the query1 and the query3 corresponding to the URL1, the query1 and the query2 corresponding to the URL2, and the query2 and the query3 corresponding to the URL 3. And recording a plurality of search keywords corresponding to the first event as a first search keyword list, and recording the first search keyword list as { query1, query2 and query3 }.
Step 103, determining a plurality of search keywords corresponding to a second event according to a plurality of links corresponding to the second event in the event library and the user behavior log, where the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used for searching and clicking at least one second information in the plurality of second information.
For example, the second event in the event library may be described by information 3 and information 4, the URL of information 3 is denoted as URL3, and the URL of information 4 is denoted as URL4, then the plurality of links corresponding to the second event are URL3 and URL4, and optionally, URL3 and URL4 form a second link list, which is denoted as { URL3, URL4}, and which is the link list corresponding to the second event. Further, according to a link list and a user behavior log corresponding to a second event, a plurality of search keywords corresponding to the second event are determined, and each search keyword in the plurality of search keywords corresponding to the second event is used for searching and clicking at least one piece of second information in the plurality of pieces of second information.
Optionally, the determining, according to the plurality of links corresponding to the second event in the event library and the user behavior log, a plurality of search keywords corresponding to the second event includes: according to each link in a plurality of links corresponding to the second event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log; and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the second event.
For example, according to each of the plurality of links corresponding to the second event, i.e., the URL3 and the URL4, at least one search keyword corresponding to the link is acquired from the user behavior log. For example, in the user behavior log, at least one search keyword corresponding to the URL3 is query2 and query3, and at least one search keyword corresponding to the URL4 is query 4. At least one search keyword corresponding to each link in the URL3 and the URL4 is determined as a plurality of search keywords corresponding to the second event, that is, the plurality of search keywords corresponding to the second event are a set of query2 and query3 corresponding to the URL3 and query4 corresponding to the URL 4. And recording a plurality of search keywords corresponding to the second event as a second search keyword list, wherein the second search keyword list is recorded as { query2, query3 and query4 }.
And 104, calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log.
For example, the correlation degree of the first event and the second event is calculated according to a first link list { URL1, URL2, URL3} corresponding to the first event, a second link list { URL3, URL4} corresponding to the second event and the user behavior log. Specifically, each link in the first link list { URL1, URL2, URL3} is traversed, and the currently traversed link and each link in the second link list { URL3, URL4} form a link pair, so as to obtain the following link pairs: (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3), (URL3, and URL 4). Optionally, each link pair includes two different links, and thus is removed (URL3 ). Further, the number of co-occurrences in the user behavior log is counted (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4), and specifically, the number of co-occurrences in the user behavior log may be queried (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4), respectively, according to the statistical results shown in table 2 above. Further, the correlation degree of the first event and the second event is calculated according to the co-occurrence times of (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4) in the user behavior log, respectively.
And 105, calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log.
For example, the correlation between the first event and the second event is calculated according to the first search keyword list { query1, query2, query3} corresponding to the first event, the second search keyword list { query2, query3, query4} corresponding to the second event and the user behavior log. Specifically, each search keyword in the first search keyword list { query1, query2, query3} is traversed, and the currently traversed search keyword and each search keyword in the second search keyword list { query2, query3, query4} form a search keyword pair, optionally, each search keyword pair includes two different search keywords, so as to obtain the following search keyword pairs: (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query 4). Further, the number of co-occurrences in the user behavior logs (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query4) is counted, and specifically, the number of co-occurrences in the user behavior logs (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query4) is counted according to the statistical results shown in table 1 above. Further, the correlation between the first event and the second event is calculated according to the co-occurrence times of (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, and query4) in the user behavior log.
In this embodiment, the method for calculating the correlation degree between the first event and the second event may select step 104 or step 105, that is, in this embodiment, the correlation degree between the first event and the second event may be calculated according to a plurality of links corresponding to the first event and a plurality of links corresponding to the second event, or the correlation degree between the first event and the second event may be calculated according to a plurality of search keywords corresponding to the first event and a plurality of search keywords corresponding to the second event.
As shown in fig. 2, it is assumed that there are 4 events in the event library, i.e., event 1, event 2, event 3, and event 4, and this is only an exemplary illustration and does not limit the number of events in the event library. According to the method of the embodiment, the correlation degree between any two events in the 4 events can be determined. When a target event, such as event 1, is specified, the correlation degree of other events in the event library with the event 1 respectively can be determined, or an event list in the event library with the correlation degree with the event 1 larger than a threshold value can be determined.
According to the embodiment of the invention, the calculation precision of the event correlation degree is improved by acquiring the user behavior log, calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log in the event library, or calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log, and calculating the correlation degree between the events through the search and click behaviors of different users.
Fig. 3 is a flowchart of an event correlation calculation method according to another embodiment of the present invention. On the basis of the above embodiment, the calculating the correlation between the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event, and the user behavior log specifically includes the following steps:
step 301, traversing each link of the plurality of links corresponding to the first event, and respectively forming a link pair by the currently traversed link corresponding to the first event and each link of the plurality of links corresponding to the second event.
For example, the plurality of links corresponding to the first event constitute a first link list, the first link list is represented as { URL1, URL2, URL3}, the plurality of links corresponding to the second event constitute a second link list, and the second link list is represented as { URL3, URL4 }. Traversing each link in the first link list { URL1, URL2, URL3}, and respectively forming a link pair by the currently traversed link and each link in the second link list { URL3, URL4}, thereby obtaining a plurality of link pairs as follows: (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3), (URL3, and URL 4).
Step 302, calculating the correlation degree of the first event and the second event according to the frequency of occurrence of each link pair in different record information in the user behavior log.
Further, the number of co-occurrences in the user behavior log is counted (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4), and specifically, the number of co-occurrences in the user behavior log may be queried (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4), respectively, according to the statistical results shown in table 2 above. Further, the correlation degree of the first event and the second event is calculated according to the co-occurrence times of (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, URL4) in the user behavior log, respectively.
Optionally, the number of times that the link pair appears in different record information in the user behavior log is the number of record information in the user behavior log that includes the link pair. As shown in Table 2, the number of co-occurrences of the link pair (URL1, URL2) is 2. For example, the user behavior log includes the Session of user 1, the Session of user 2, and the Session of user 3. Here, the Session of user 1 is represented by (Session _1, query1, URL1, query2, URL2), the Session of user 2 is represented by (Session _2, query1, URL2, query2, URL3, query3, URL1), and the Session of user 3 is represented by (Session _3, query2, URL2, query3, URL3, query4, URL 4). It can be seen that (URL1, URL2) is included in the Session of user 1, (URL1, URL2) is included in the Session of user 2, and (URL1, URL2) is not included in the Session of user 3, i.e. two sessions are included in the user behavior log (URL1, URL 2). Therefore, the number of co-occurrences of the link pair (URL1, URL2) is the number of sessions included in the user behavior log (URL1, URL 2). The co-occurrence times of other link pairs are the same as the above, and are not described in detail.
Optionally, the calculating the correlation between the first event and the second event according to the number of times that each link pair appears in different record information in the user behavior log includes the following possible implementation manners:
one possible implementation is: and adding and calculating the times of occurrence of each link pair in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
For example, the number of co-occurrences of (URL1, URL3), (URL1, URL4), (URL2, URL3), (URL2, URL4), (URL3, and URL4) in the user behavior log is added to obtain the correlation between the first event and the second event.
Another possible implementation is: calculating the probability of each link pair appearing in one record information according to the frequency of each link pair appearing in different record information in the user behavior log; calculating mutual point information PMI of a link corresponding to the first event and a link corresponding to the second event in the link pair according to the probability of the link pair appearing in one record information, the probability of a link corresponding to the first event appearing in one record information, and the probability of a link corresponding to the second event appearing in one record information; and adding and calculating the point mutual information PMI of the link corresponding to the first event and the link corresponding to the second event in each link pair to obtain the correlation degree of the first event and the second event.
For example, taking (URL1, URL3) as an example, as shown in table 2, the number of co-occurrences of (URL1, URL3) in the user behavior log is 1, the user behavior log includes 3 pieces of record information in total, and the probability of occurrence of (URL1, URL3) in one piece of record information is 1/3. URL1 appears in the Session of user 1 and the Session of user 2, that is, URL1 appears 2 times, and URL1 appears in one piece of record information with probability 2/3. URL3 appears in the Session of user 2 and the Session of user 3, that is, URL3 appears 2 times, and URL3 appears in one record information with probability 2/3. Let the probability of occurrence of (URL1, URL3) in one record information be P (URL1& URL3), the probability of occurrence of URL1 in one record information be P (URL1), the probability of occurrence of URL3 in one record information be P (URL3), and optionally, the probability of occurrence of each URL in one record information is not 0. From P (URL1& URL3), P (URL1) and P (URL3), it can calculate the PMI (URL1, URL3) which is the Point Mutual Information (PMI) of URL1 and URL3, and the PMI (URL1, URL3) can be calculated according to the following formula:
PMI(URL1,URL3)=log2(P(URL1&URL3)/(P(URL1)*P(URL3)))
similarly, PMI (URL1, URL4), PMI (URL2, URL3), PMI (URL2, URL4) and PMI (URL3, URL4) may be calculated, and PMI (URL1, URL3), PMI (URL1, URL4), PMI (URL2, URL3), PMI (URL2, URL4) and PMI (URL3, URL4) may be added to obtain the correlation between the first event and the second event.
Without loss of generality, for example, the plurality of links corresponding to the first event constitute a first linked list, the first linked list is represented as { a1, a2, …, An }, the plurality of links corresponding to the second event constitute a second linked list, and the second linked list is represented as { B1, B2, …, Bn }. Traversing each link in the first linked list { A1, A2, …, An }, and respectively forming a link pair by the currently traversed link and each link in the second linked list { B1, B2, …, Bn }, thereby obtaining a plurality of link pairs as follows: (A1, B1), (A1, B2), …, (An, B1), …, (An, Bn). One method of calculating the correlation of the first event and the second event according to (a1, B1), (a1, B2), …, (An, B1), …, (An, Bn) is: and (A1, B1), (A1, B2), …, (An, B1), …, (An, Bn) are respectively added and calculated in the co-occurrence times of the user behavior logs, and the correlation degree of the first event and the second event is obtained. The other method is as follows: according to the formula, PMI (A1, B1), PMI (A1, B2), …, PMI (An, B1), … and PMI (An, Bn) are calculated, and PMI (A1, B1), PMI (A1, B2), …, PMI (An, B1), … and PMI (An, Bn) are added to obtain the correlation degree of the first event and the second event.
According to the embodiment of the invention, each link in the plurality of links corresponding to the first event is traversed, the link corresponding to the first event and each link in the plurality of links corresponding to the second event which are traversed currently form a link pair, and the correlation degree of the first event and the second event is calculated according to the occurrence frequency of each link pair in different record information in the user behavior log, so that the calculation accuracy of the event correlation degree is further improved.
Fig. 4 is a flowchart of an event correlation calculation method according to another embodiment of the present invention. On the basis of the above embodiment, the calculating a correlation degree between the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event, and the user behavior log specifically includes the following steps:
step 401, traversing each search keyword of the plurality of search keywords corresponding to the first event, and respectively forming a search keyword pair by the currently traversed search keyword corresponding to the first event and each search keyword of the plurality of search keywords corresponding to the second event.
For example, the plurality of search keywords corresponding to the first event constitute a first search keyword list, the first search keyword list is represented as { query1, query2, query3}, the plurality of search keywords corresponding to the second event constitute a second search keyword list, and the second search keyword list is represented as { query2, query3, query4 }. Traversing each search keyword in the first search keyword list { query1, query2 and query3}, and respectively forming a search keyword pair by the currently traversed search keyword and each search keyword in the second search keyword list { query2, query3 and query4}, so as to obtain a plurality of search keyword pairs as follows: (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query 4).
Step 402, calculating the correlation degree of the first event and the second event according to the frequency of occurrence of each search keyword pair in different record information in the user behavior log.
Further, the number of co-occurrences in the user behavior logs (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query4) is counted, and specifically, the number of co-occurrences in the user behavior logs (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, query4) is counted according to the statistical results shown in table 1 above. Further, the correlation between the first event and the second event is calculated according to the co-occurrence times of (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, and query4) in the user behavior log.
Optionally, the number of times that the search keyword pair appears in different record information in the user behavior log is the number of record information in the user behavior log that includes the search keyword pair. As shown in table 1, the number of co-occurrences of the search keyword pair (query1, query2) is 2. For example, the user behavior log includes the Session of user 1, the Session of user 2, and the Session of user 3. Here, the Session of user 1 is represented by (Session _1, query1, URL1, query2, URL2), the Session of user 2 is represented by (Session _2, query1, URL2, query2, URL3, query3, URL1), and the Session of user 3 is represented by (Session _3, query2, URL2, query3, URL3, query4, URL 4). It can be seen that (query1, query2) is included in the Session of user 1, (query1, query2) is included in the Session of user 2, and (query1, query2) is not included in the Session of user 3, i.e. two sessions are included in the user behavior log (query1, query 2). Therefore, the number of co-occurrences of the search keyword pair (query1, query2) is the number of sessions included in the user behavior log (query1, query 2). The co-occurrence times of other search keyword pairs are the same as the above, and are not described in detail.
In addition, the calculating the correlation degree of the first event and the second event according to the occurrence frequency of each search keyword pair in different record information in the user behavior log includes the following feasible implementation manners:
one possible implementation is: and adding and calculating the times of occurrence of each search keyword in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
For example, (query1, query2), (query1, query3), (query1, query4), (query2, query3), (query2, query4), (query3, and query4) are added to the number of co-occurrences in the user behavior log, so as to obtain the correlation between the first event and the second event.
Another possible implementation is: calculating the probability of each search keyword pair appearing in one record information according to the frequency of each search keyword pair appearing in different record information in the user behavior log; calculating the mutual point information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in the search keyword pair according to the probability of the search keyword pair appearing in one record message, the probability of the search keyword corresponding to the first event in the search keyword pair appearing in one record message and the probability of the search keyword corresponding to the second event in the search keyword pair appearing in one record message; and adding and calculating the point mutual information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in each search keyword pair to obtain the correlation degree of the first event and the second event.
For example, taking (query1, query2) as an example, as shown in table 1, if the number of co-occurrences of (query1, query2) in the user behavior log is 2, and the user behavior log includes 3 pieces of recorded information in total, the probability of occurrence of (query1, query2) in one piece of recorded information is 2/3. The query1 appears in the Session of user 1 and the Session of user 2, that is, the number of occurrences of query1 is 2, and the probability of occurrence of query1 in one recorded information is 2/3. The query2 appears in the Session of user 1, the Session of user 2, and the Session of user 3, that is, the number of times the query2 appears is 3, and the probability that the query2 appears in one record information is 3/3. The probability of occurrence of (query1, query2) in one piece of recorded information is represented as P (query1& query2), the probability of occurrence of query1 in one piece of recorded information is represented as P (query1), the probability of occurrence of query2 in one piece of recorded information is represented as P (query2), and optionally, the probability of occurrence of each query in one piece of recorded information is not 0. From P (query1& query2), P (query1) and P (query2), it is possible to calculate PMI (query1, query2) which is the point mutual information PMI of query1 and query2, and PMI (query1, query2) can be calculated according to the following formula:
PMI(query1,query2)=log2(P(query1&query2)/(P(query1)*P(query2)))
similarly, PMI (query1, query3), PMI (query1, query4), PMI (query2, query3), PMI (query2, query4) and PMI (query3, query4) can be calculated, and PMI (query1, query2), PMI (query1, query3), PMI (query1, query4), PMI (query2, query3), PMI (query2, query4) and PMI (query3, query4) are added to calculate the correlation between the first event and the second event.
Without loss of generality, for example, the plurality of search keywords corresponding to the first event constitute a first search keyword list, the first search keyword list is represented as { a1, a2, …, An }, the plurality of search keywords corresponding to the second event constitute a second search keyword list, and the second search keyword list is represented as { B1, B2, …, Bn }. Traversing each search keyword in the first search keyword list { A1, A2, …, An }, and respectively forming a search keyword pair by the currently traversed search keyword and each search keyword in the second search keyword list { B1, B2, …, Bn }, thereby obtaining a plurality of search keyword pairs as follows: (A1, B1), (A1, B2), …, (An, B1), …, (An, Bn). One method of calculating the correlation of the first event and the second event according to (a1, B1), (a1, B2), …, (An, B1), …, (An, Bn) is: and (A1, B1), (A1, B2), …, (An, B1), …, (An, Bn) are respectively added and calculated in the co-occurrence times of the user behavior logs, and the correlation degree of the first event and the second event is obtained. The other method is as follows: according to the formula, PMI (A1, B1), PMI (A1, B2), …, PMI (An, B1), … and PMI (An, Bn) are calculated, and PMI (A1, B1), PMI (A1, B2), …, PMI (An, B1), … and PMI (An, Bn) are added to obtain the correlation degree of the first event and the second event.
According to the embodiment of the invention, each search keyword in the plurality of search keywords corresponding to the first event is traversed, the search keyword corresponding to the first event which is traversed currently and each search keyword in the plurality of search keywords corresponding to the second event form a search keyword pair respectively, and the correlation degree of the first event and the second event is calculated according to the occurrence frequency of each search keyword pair in different record information in the user behavior log, so that the calculation precision of the correlation degree of the events is further improved.
FIG. 5 is a schematic structural diagram of an event correlation calculation apparatus according to an embodiment of the present invention; fig. 6 is a schematic structural diagram of an event correlation calculation apparatus according to an embodiment of the present invention. The event correlation calculation apparatus provided in the embodiment of the present invention may execute the processing procedure provided in the embodiment of the event correlation calculation method, as shown in fig. 5, the event correlation calculation apparatus 50 includes: an acquisition module 51, a first determination module 52, a second determination module 53, and a first calculation module 54; alternatively, as shown in fig. 6, the event correlation calculation means 50 includes: the device comprises an acquisition module 51, a first determination module 52, a second determination module 53 and a second calculation module 55. The obtaining module 51 is configured to obtain a user behavior log, where the user behavior log includes a plurality of pieces of recorded information, each piece of recorded information in the plurality of pieces of recorded information corresponds to a search behavior of a user, and the recorded information includes at least one search keyword and at least one link clicked by the user; the first determining module 52 is configured to determine, according to a plurality of links corresponding to a first event in an event library and the user behavior log, a plurality of search keywords corresponding to the first event, where the first event corresponds to a plurality of pieces of first information, the plurality of links corresponding to the first event correspond to the plurality of pieces of first information one-to-one, and each search keyword in the plurality of search keywords corresponding to the first event is used to search for and click at least one piece of first information in the plurality of pieces of first information; the second determining module 53 is configured to determine, according to a plurality of links corresponding to a second event in the event library and the user behavior log, a plurality of search keywords corresponding to the second event, where the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used to search for and click at least one second information in the plurality of second information; the first calculating module 54 is configured to calculate a correlation degree between the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event, and the user behavior log; the second calculating module 55 is configured to calculate a correlation between the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event, and the user behavior log.
Optionally, the first determining module 52 is specifically configured to: according to each link in a plurality of links corresponding to the first event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log; and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the first event.
Optionally, the second determining module 53 is specifically configured to: according to each link in a plurality of links corresponding to the second event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log; and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the second event.
Optionally, the first calculating module 54 includes: a first traversal unit 541 and a first calculation unit 542; the first traversal unit 541 is configured to traverse each link in the plurality of links corresponding to the first event, and respectively form a link pair by the currently traversed link corresponding to the first event and each link in the plurality of links corresponding to the second event; the first calculating unit 542 is configured to calculate a correlation degree between the first event and the second event according to the number of times that each of the link pairs appears in different recorded information in the user behavior log.
Optionally, the first calculating unit 542 is specifically configured to: and adding and calculating the times of occurrence of each link pair in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
Optionally, the first calculating unit 542 is specifically configured to: calculating the probability of each link pair appearing in one record information according to the frequency of each link pair appearing in different record information in the user behavior log; calculating mutual point information PMI of a link corresponding to the first event and a link corresponding to the second event in the link pair according to the probability of the link pair appearing in one record information, the probability of a link corresponding to the first event appearing in one record information, and the probability of a link corresponding to the second event appearing in one record information; and adding and calculating the point mutual information PMI of the link corresponding to the first event and the link corresponding to the second event in each link pair to obtain the correlation degree of the first event and the second event.
Optionally, the second calculating module 55 includes: a second traversal unit 551 and a second calculation unit 552; the second traversing unit 551 is configured to traverse each search keyword of the plurality of search keywords corresponding to the first event, and respectively form a search keyword pair by the currently traversed search keyword corresponding to the first event and each search keyword of the plurality of search keywords corresponding to the second event; the second calculating unit 552 is configured to calculate the correlation degree between the first event and the second event according to the number of times that each search keyword pair appears in different record information in the user behavior log.
Optionally, the second calculating unit 552 is specifically configured to: and adding and calculating the times of occurrence of each search keyword in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
Optionally, the second calculating unit 552 is specifically configured to: calculating the probability of each search keyword pair appearing in one record information according to the frequency of each search keyword pair appearing in different record information in the user behavior log; calculating the mutual point information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in the search keyword pair according to the probability of the search keyword pair appearing in one record message, the probability of the search keyword corresponding to the first event in the search keyword pair appearing in one record message and the probability of the search keyword corresponding to the second event in the search keyword pair appearing in one record message; and adding and calculating the point mutual information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in each search keyword pair to obtain the correlation degree of the first event and the second event.
The event correlation calculation apparatus in the embodiments shown in fig. 5 and fig. 6 can be used to implement the technical solutions of the above method embodiments, and the implementation principle and technical effects are similar, and are not described herein again.
Fig. 7 is a schematic structural diagram of an apparatus provided in an embodiment of the present invention. The device provided in the embodiment of the present invention may execute the processing procedure provided in the embodiment of the event correlation calculation method, as shown in fig. 7, the device 70 includes: memory 71, processor 72, computer programs; wherein a computer program is stored in the memory 71 and is configured to be executed by the processor 72 to implement the event correlation calculation method as described above.
The apparatus in the embodiment shown in fig. 7 may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
In addition, an embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the event correlation calculation method described in the above embodiment.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (22)

1. An event correlation calculation method, comprising:
acquiring a user behavior log, wherein the user behavior log comprises a plurality of record information, each record information in the plurality of record information corresponds to one search behavior of a user, and the record information comprises at least one search keyword and at least one link clicked by the user;
determining a plurality of search keywords corresponding to a first event according to a plurality of links corresponding to the first event in an event library and the user behavior log, wherein the first event corresponds to a plurality of first information, the plurality of links corresponding to the first event correspond to the plurality of first information one by one, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one first information in the plurality of first information;
determining a plurality of search keywords corresponding to a second event according to a plurality of links corresponding to the second event in the event library and the user behavior log, wherein the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used for searching and clicking at least one second information in the plurality of second information;
calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log; alternatively, the first and second electrodes may be,
and calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log.
2. The method of claim 1, wherein determining a plurality of search keywords corresponding to a first event from a plurality of links corresponding to the first event in an event library and the user behavior log comprises:
according to each link in a plurality of links corresponding to the first event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log;
and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the first event.
3. The method of claim 1, wherein determining a plurality of search keywords corresponding to a second event from a plurality of links corresponding to the second event in the event repository and the user behavior log comprises:
according to each link in a plurality of links corresponding to the second event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log;
and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the second event.
4. The method according to any one of claims 1-3, wherein the calculating the relevancy of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event, and the user behavior log comprises:
traversing each link in the plurality of links corresponding to the first event, and respectively forming a link pair by the currently traversed link corresponding to the first event and each link in the plurality of links corresponding to the second event;
and calculating the correlation degree of the first event and the second event according to the frequency of occurrence of each link pair in different record information in the user behavior log.
5. The method of claim 4, wherein calculating the correlation between the first event and the second event according to the number of occurrences of each of the link pairs in different recorded information in the user behavior log comprises:
and adding and calculating the times of occurrence of each link pair in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
6. The method of claim 4, wherein calculating the correlation between the first event and the second event according to the number of occurrences of each of the link pairs in different recorded information in the user behavior log comprises:
calculating the probability of each link pair appearing in one record information according to the frequency of each link pair appearing in different record information in the user behavior log;
calculating mutual point information PMI of a link corresponding to the first event and a link corresponding to the second event in the link pair according to the probability of the link pair appearing in one record information, the probability of a link corresponding to the first event appearing in one record information, and the probability of a link corresponding to the second event appearing in one record information;
and adding and calculating the point mutual information PMI of the link corresponding to the first event and the link corresponding to the second event in each link pair to obtain the correlation degree of the first event and the second event.
7. The method of claim 4, wherein the number of occurrences of the link pair in different record information in the user behavior log is the number of record information in the user behavior log that includes the link pair.
8. The method according to any one of claims 1-3, wherein the calculating the correlation between the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event, and the user behavior log comprises:
traversing each search keyword in a plurality of search keywords corresponding to the first event, and respectively forming a search keyword pair by the currently traversed search keyword corresponding to the first event and each search keyword in a plurality of search keywords corresponding to the second event;
and calculating the correlation degree of the first event and the second event according to the occurrence frequency of each search keyword pair in different record information in the user behavior log.
9. The method of claim 8, wherein calculating the relevancy of the first event and the second event according to the number of occurrences of each search keyword pair in different recorded information in the user behavior log comprises:
and adding and calculating the times of occurrence of each search keyword in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
10. The method of claim 8, wherein calculating the relevancy of the first event and the second event according to the number of occurrences of each search keyword pair in different recorded information in the user behavior log comprises:
calculating the probability of each search keyword pair appearing in one record information according to the frequency of each search keyword pair appearing in different record information in the user behavior log;
calculating the mutual point information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in the search keyword pair according to the probability of the search keyword pair appearing in one record message, the probability of the search keyword corresponding to the first event in the search keyword pair appearing in one record message and the probability of the search keyword corresponding to the second event in the search keyword pair appearing in one record message;
and adding and calculating the point mutual information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in each search keyword pair to obtain the correlation degree of the first event and the second event.
11. The method of claim 8, wherein the number of occurrences of the search keyword pair in different recorded information in the user behavior log is the number of recorded information in the user behavior log that includes the search keyword pair.
12. An event correlation calculation apparatus, comprising:
the system comprises an acquisition module, a search module and a display module, wherein the acquisition module is used for acquiring a user behavior log, the user behavior log comprises a plurality of pieces of recorded information, each piece of recorded information in the plurality of pieces of recorded information corresponds to one-time search behavior of one user, and the recorded information comprises at least one search keyword and at least one link clicked by the user;
the first determining module is used for determining a plurality of search keywords corresponding to a first event according to a plurality of links corresponding to the first event in an event library and the user behavior log, wherein the first event corresponds to a plurality of pieces of first information, the plurality of links corresponding to the first event correspond to the plurality of pieces of first information one by one, and each search keyword in the plurality of search keywords corresponding to the first event is used for searching and clicking at least one piece of first information in the plurality of pieces of first information;
a second determining module, configured to determine, according to a plurality of links corresponding to a second event in the event library and the user behavior log, a plurality of search keywords corresponding to the second event, where the second event corresponds to a plurality of second information, the plurality of links corresponding to the second event correspond to the plurality of second information one to one, and each search keyword in the plurality of search keywords corresponding to the second event is used to search for and click at least one second information in the plurality of second information;
the first calculation module is used for calculating the correlation degree of the first event and the second event according to the plurality of links corresponding to the first event, the plurality of links corresponding to the second event and the user behavior log; alternatively, the first and second electrodes may be,
and the second calculating module is used for calculating the correlation degree of the first event and the second event according to the plurality of search keywords corresponding to the first event, the plurality of search keywords corresponding to the second event and the user behavior log.
13. The event correlation calculation device according to claim 12, wherein the first determination module is specifically configured to:
according to each link in a plurality of links corresponding to the first event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log;
and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the first event.
14. The event correlation calculation device according to claim 12, wherein the second determination module is specifically configured to:
according to each link in a plurality of links corresponding to the second event in the event library, acquiring at least one search keyword corresponding to the link from the user behavior log;
and determining at least one search keyword corresponding to each link in the plurality of links as a plurality of search keywords corresponding to the second event.
15. The event correlation calculation device according to any one of claims 12 to 14, wherein the first calculation module includes:
the first traversal unit is used for traversing each link in the plurality of links corresponding to the first event, and respectively forming a link pair by the currently traversed link corresponding to the first event and each link in the plurality of links corresponding to the second event;
and the first calculating unit is used for calculating the correlation degree of the first event and the second event according to the frequency of occurrence of each link pair in different record information in the user behavior log.
16. The event correlation calculation device according to claim 15, wherein the first calculation unit is specifically configured to:
and adding and calculating the times of occurrence of each link pair in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
17. The event correlation calculation device according to claim 15, wherein the first calculation unit is specifically configured to:
calculating the probability of each link pair appearing in one record information according to the frequency of each link pair appearing in different record information in the user behavior log;
calculating mutual point information PMI of a link corresponding to the first event and a link corresponding to the second event in the link pair according to the probability of the link pair appearing in one record information, the probability of a link corresponding to the first event appearing in one record information, and the probability of a link corresponding to the second event appearing in one record information;
and adding and calculating the point mutual information PMI of the link corresponding to the first event and the link corresponding to the second event in each link pair to obtain the correlation degree of the first event and the second event.
18. The event correlation calculation device according to any one of claims 12 to 14, wherein the second calculation module includes:
the second traversal unit is used for traversing each search keyword in the plurality of search keywords corresponding to the first event, and respectively forming a search keyword pair by the currently traversed search keyword corresponding to the first event and each search keyword in the plurality of search keywords corresponding to the second event;
and the second calculating unit is used for calculating the correlation degree of the first event and the second event according to the frequency of occurrence of each search keyword pair in different record information in the user behavior log.
19. The event correlation calculation device according to claim 18, wherein the second calculation unit is specifically configured to:
and adding and calculating the times of occurrence of each search keyword in different record information in the user behavior log to obtain the correlation degree of the first event and the second event.
20. The event correlation calculation device according to claim 18, wherein the second calculation unit is specifically configured to:
calculating the probability of each search keyword pair appearing in one record information according to the frequency of each search keyword pair appearing in different record information in the user behavior log;
calculating the mutual point information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in the search keyword pair according to the probability of the search keyword pair appearing in one record message, the probability of the search keyword corresponding to the first event in the search keyword pair appearing in one record message and the probability of the search keyword corresponding to the second event in the search keyword pair appearing in one record message;
and adding and calculating the point mutual information PMI of the search keyword corresponding to the first event and the search keyword corresponding to the second event in each search keyword pair to obtain the correlation degree of the first event and the second event.
21. An apparatus, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-11.
22. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-11.
CN201811528235.2A 2018-12-13 2018-12-13 Event correlation calculation method, device, equipment and storage medium Active CN109740075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811528235.2A CN109740075B (en) 2018-12-13 2018-12-13 Event correlation calculation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811528235.2A CN109740075B (en) 2018-12-13 2018-12-13 Event correlation calculation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109740075A CN109740075A (en) 2019-05-10
CN109740075B true CN109740075B (en) 2020-12-01

Family

ID=66359409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811528235.2A Active CN109740075B (en) 2018-12-13 2018-12-13 Event correlation calculation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109740075B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925692B (en) * 2022-07-21 2022-10-11 中科雨辰科技有限公司 Data processing system for acquiring target event

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737045B (en) * 2011-04-08 2014-02-19 北京百度网讯科技有限公司 Method and device for relevancy computation
US9792311B2 (en) * 2011-06-03 2017-10-17 Apple Inc. System and method for managing a partitioned database of user relationship data
CN105808623B (en) * 2014-12-31 2019-06-07 北京奇虎科技有限公司 A kind of page access event correlation methodology and device based on search
US10437898B2 (en) * 2015-05-04 2019-10-08 Dac Group (Holdings) Limited Systems and methods for targeted content presentation based on search query analysis
CN108572971B (en) * 2017-03-09 2022-11-01 百度在线网络技术(北京)有限公司 Method and device for mining keywords related to search terms
CN107885793A (en) * 2017-10-20 2018-04-06 江苏大学 A kind of hot microblog topic analyzing and predicting method and system
CN108154395B (en) * 2017-12-26 2021-10-29 上海新炬网络技术有限公司 Big data-based customer network behavior portrait method

Also Published As

Publication number Publication date
CN109740075A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
Gerber et al. Defacto—temporal and multilingual deep fact validation
US8612435B2 (en) Activity based users' interests modeling for determining content relevance
Kim et al. Text: Automatic template extraction from heterogeneous web pages
US8666990B2 (en) System and method for determining authority ranking for contemporaneous content
US8694374B1 (en) Detecting click spam
CN107704467B (en) Search quality evaluation method and device
IL234134A (en) Method of machine learning classes of search queries
CN104361115B (en) It is a kind of based on the entry Weight Determination clicked jointly and device
CN111190792B (en) Log storage method and device, electronic equipment and readable storage medium
US10592841B2 (en) Automatic clustering by topic and prioritizing online feed items
JP2013531289A (en) Use of model information group in search
Abedjan et al. Dataxformer: Leveraging the Web for Semantic Transformations.
US20150234883A1 (en) Method and system for retrieving real-time information
US20140289268A1 (en) Systems and methods of rationing data assembly resources
CN104615723B (en) The determination method and apparatus of query word weighted value
JP5367632B2 (en) Knowledge amount estimation apparatus and program
CN109740075B (en) Event correlation calculation method, device, equipment and storage medium
US20180341709A1 (en) Unstructured search query generation from a set of structured data terms
EP3304820A1 (en) Method and apparatus for analysing performance of a network by managing network data relating to operation of the network
CN106708880B (en) Topic associated word acquisition method and device
KR20150008635A (en) Device for selecting core kyword, method for selecting core kyword, and method for providing search service using the same
Daoud et al. Mining query-driven contexts for geographic and temporal search
JP2011170699A (en) Device, method and program for estimating knowledge amount in each field of retrieval system user
Khelghati Deep web content monitoring
Lin et al. Predicting next search actions with search engine query logs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant