CN111080349B - Method, device, server and medium for identifying multiple devices of same user - Google Patents
Method, device, server and medium for identifying multiple devices of same user Download PDFInfo
- Publication number
- CN111080349B CN111080349B CN201911227587.9A CN201911227587A CN111080349B CN 111080349 B CN111080349 B CN 111080349B CN 201911227587 A CN201911227587 A CN 201911227587A CN 111080349 B CN111080349 B CN 111080349B
- Authority
- CN
- China
- Prior art keywords
- cross
- screen pair
- sample
- information
- identification information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/44—Program or device authentication
- G06F21/445—Program or device authentication by mutual authentication, e.g. between devices or programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention provides a method, a device, a server and a storage medium for identifying a plurality of devices of the same user, wherein the method comprises the following steps: based on the equipment identification information, the source IP address and the time information, combining the equipment identification information which uses the same source IP address to access the webpage two by two to form a first candidate cross-screen pair; calculating the behavior correlation of the identification information of the two devices in the first candidate cross-screen pair; acquiring each first candidate cross-screen pair with the behavior correlation larger than the correlation threshold value as each second candidate cross-screen pair; target webpage information corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair is input into a cross-screen pair prediction model, and whether the prediction probability of the real cross-screen pair is obtained; and acquiring each second candidate cross-screen pair with the prediction probability larger than the target cross-screen pair threshold value as a plurality of devices of the same user respectively. It can be seen that by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
Description
Technical Field
The present invention relates to the field of internet application technologies, and in particular, to a method, an apparatus, a service server, and a storage medium for identifying multiple devices of the same user.
Background
In the computer PC internet age, users browse and shop on the internet using a computer. But with the progress of the present technology, the mobile internet era has been reached, and users use mobile devices such as computers, smart phones, tablet computers and the like to browse and shop.
At present, many internet application systems push advertisement information to users, and the advertisement information is mostly delivered by an advertisement delivery system after receiving advertisement requests sent by the users through the internet application systems.
The advertisement delivery is targeted to a person, and because the advertisement request is received from different internet application systems, the advertisement delivery system does not have account information of the user, and cannot learn which electronic devices are the same user, and when the advertisement is delivered, advertisement delivery cannot be carried out for a plurality of electronic devices of one user.
Similarly, other business service systems without account systems besides the advertisement delivery system cannot identify a plurality of electronic devices of a user, and cannot provide targeted business services for the user.
Accordingly, there is a need for a method of identifying multiple electronic devices of the same user without user account information in order to provide targeted services to the user.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, a service server and a storage medium for identifying a plurality of devices of the same user, so that the plurality of electronic devices of the same user can be identified under the condition that no user account information exists. The specific technical scheme is as follows:
in a first aspect, the present invention provides a method for identifying a plurality of devices of a same user, applied to a service server, where the service server is communicatively connected to a third party website server, the method comprising:
obtaining a plurality of pieces of user behavior data, wherein each piece of user behavior data comprises: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
based on the equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the equipment identification information which uses the same source IP address for webpage access in a first preset time period, and forming at least one first candidate cross-screen pair;
Calculating the behavior correlation between the user behavior data corresponding to the two device identification information in each first candidate cross-screen pair;
acquiring each first candidate cross-screen pair, wherein the behavior correlation is greater than a preset correlation threshold value, and the first candidate cross-screen pair is used as each second candidate cross-screen pair;
respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair or not; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
and acquiring each second candidate cross-screen pair, wherein the prediction probability of each second candidate cross-screen pair is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as a plurality of devices of the same user.
Optionally, the device identification information includes: the mobile terminal comprises Cookie information which is used for identifying computer equipment and is generated when a user accesses target webpage information of the third-party website, and mobile terminal equipment identification information which is used for identifying mobile terminal equipment and is obtained when the user accesses the target webpage information of the third-party website;
The step of combining the device identification information accessed by the web page by using the same source IP address within a preset time period based on the device identification information in each piece of user behavior data, the source IP address used for accessing the target web page and the time information for accessing the target web page, to form at least one first candidate cross-screen pair comprises the following steps:
based on the Cookie information or the mobile terminal equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
Optionally, the step of calculating the behavior correlation between the user behavior data corresponding to the two device identification information in each first candidate cross-screen pair includes:
according to the source IP address used by each first candidate cross-screen pair for accessing the target webpage in the user behavior data and the time information for accessing the target webpage, calculating the source IP address Jaccard coefficient and the time Jaccard coefficient of each first candidate cross-screen pair;
Calculating the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the Cookie information in each first candidate cross-screen pair user behavior data as a first IP parameter, the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the mobile terminal equipment identification information as a second IP parameter, the ratio of the number of dates in the intersection of the dates of the Cookie information and the dates of the Cookie information as a first time parameter, and the ratio of the number of the dates of the Cookie information and the dates of the mobile terminal equipment identification information as a second time parameter;
inputting the source IP address Jaccard coefficient, the time Jaccard coefficient, the first IP parameter, the second IP parameter, the first time parameter and the second time parameter into a pre-trained linear prediction model to obtain the behavior correlation between user behavior data corresponding to two pieces of equipment identification information in a first candidate cross-screen pair; the linear prediction model is obtained by training an initial linear model in advance by using a sample source IP address Jaccard coefficient, a sample time Jaccard coefficient, a sample first IP parameter, a sample second IP parameter, a sample first time parameter and a sample second time parameter of a sample.
Optionally, the initial cross-screen pair prediction model is an FM model or an FFM model;
the step of inputting the target webpage information of the third-party website accessed by the user in the user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in the second preset time period into the pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair, comprises the following steps:
converting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into target data in a format which can be identified by an FM model or an FFM model according to a preset format conversion mode;
inputting target data of each second candidate cross-screen pair into a pre-trained cross-screen pair prediction model;
and obtaining the prediction probability of whether the output of the cross-screen pair prediction model is the real cross-screen pair.
Optionally, the linear prediction model is obtained by training in advance by adopting the following steps:
acquiring each correct sample cross-screen pair and each wrong sample cross-screen pair;
obtaining a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, wherein each piece of sample user behavior data comprises: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
According to the source IP address used by each sample cross-screen pair for accessing the target webpage in the user behavior data and the time information for accessing the target webpage, calculating the sample source IP address Jaccard coefficient and the sample time Jaccard coefficient of each sample cross-screen pair;
calculating the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the Cookie information in each sample cross-screen pair in each user behavior data as a sample first IP parameter, the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the mobile terminal equipment identification information as a sample second IP parameter, the ratio of the number of dates in the intersection of the dates of the Cookie information and the dates of the Cookie information to the number of dates in the intersection of the dates of the Cookie information to the mobile terminal equipment identification information to be used as a sample first time parameter, and the ratio of the number of dates of the Cookie information to the mobile terminal equipment identification information to be used as a sample second time parameter;
Inputting the sample source IP address Jaccard coefficient, the sample time Jaccard coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter and the sample second time parameter of each correct sample cross-screen pair and each wrong sample cross-screen pair into a current initial linear model to obtain sample cross-screen pair behavior correlation between user behavior data corresponding to two pieces of equipment identification information in each sample cross-screen pair;
calculating a loss value according to the sample cross-screen pair behavior correlation, whether the sample cross-screen pair real result is a cross-screen pair real result or not and a preset loss function;
judging whether the current initial linear model converges or not according to a loss value of a preset loss function;
if yes, determining the current initial linear model as a trained linear prediction model; if not, the model parameters of the current initial linear model are adjusted, and the step of inputting the sample source IP address Jaccard coefficient, the sample time Jaccard coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter and the sample second time parameter of each correct sample cross-screen pair and each wrong sample cross-screen pair into the current initial linear model to obtain the sample cross-screen pair behavior correlation between the user behavior data corresponding to the two device identification information in each sample cross-screen pair is executed.
Optionally, the cross-screen pair prediction model is obtained by training in advance by adopting the following steps:
according to target webpage information of a third-party website accessed by a user in a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, converting the target webpage information into sample target data in a format which can be identified by an FM model or an FFM model in a preset format conversion mode;
sample target data of each sample cross-screen pair are input into a current initial cross-screen pair prediction model;
obtaining the prediction probability of whether the current initial cross-screen pair prediction model outputs a cross-screen pair or not;
calculating a loss value according to the prediction probability, whether each sample cross-screen pair is a real result of the cross-screen pair or not and a preset loss function;
judging whether the current initial cross-screen pair prediction model is converged or not according to a loss value of a preset loss function;
if yes, determining the current initial cross-screen pair prediction model as a trained cross-screen pair prediction model; if not, the model parameters of the current initial cross-screen pair prediction model are adjusted, and the step of inputting the sample target data of each sample cross-screen pair into the current initial cross-screen pair prediction model is carried out.
In a second aspect, the present invention provides an apparatus for identifying a plurality of devices of a same user, for application to a service server, the service server being communicatively connected to a third party website server, the apparatus comprising:
a user behavior data obtaining unit, configured to obtain a plurality of pieces of user behavior data, where each piece of user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
the first candidate cross-screen pair forming unit is used for combining equipment identification information which uses the same source IP address to access the webpage in a first preset time period based on the equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, so as to form at least one first candidate cross-screen pair;
the behavior correlation calculation unit is used for calculating the behavior correlation between the user behavior data corresponding to the two pieces of equipment identification information in each first candidate cross-screen pair;
the second candidate cross-screen pair acquisition unit is used for acquiring each first candidate cross-screen pair, wherein the behavior correlation is larger than a preset correlation threshold value, and each first candidate cross-screen pair is used as each second candidate cross-screen pair;
The prediction probability obtaining unit is used for respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair or not; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
and the multiple device acquisition units of the same user are used for acquiring the second candidate cross-screen pairs, wherein the prediction probability of the second candidate cross-screen pairs is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as multiple devices of the same user.
Optionally, the device identification information includes: the mobile terminal comprises Cookie information which is used for identifying computer equipment and is generated when a user accesses target webpage information of the third-party website, and mobile terminal equipment identification information which is used for identifying mobile terminal equipment and is obtained when the user accesses the target webpage information of the third-party website;
The first candidate cross-screen pair forming unit is specifically configured to:
based on the Cookie information or the mobile terminal equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
In a third aspect, the present invention provides a service server, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the method steps for identifying a plurality of devices of the same user when executing the program stored in the memory.
In a fourth aspect, the present invention provides a computer readable storage medium having a computer program stored therein, the computer program being executable by a processor to perform any of the above steps of a method of identifying a plurality of devices of the same user.
In a fifth aspect, embodiments of the present invention also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform any of the above-described methods of identifying a plurality of devices of the same user.
It can be seen that, in the embodiment of the present invention, each piece of user behavior data is applied to include: and finally identifying a plurality of devices of the same user by the device identification information, the target webpage information of the third-party website accessed by the user, the source IP address used for accessing the target webpage and the time information for accessing the target webpage. Therefore, by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying multiple devices of the same user, which is provided in an embodiment of the present invention, and is applied to a service server, where the service server is in communication connection with a third-party website server;
FIG. 2 is another flow chart of a method for identifying multiple devices of the same user, which is applied to a service server and is communicatively connected with a third-party website server;
FIG. 3 is a training flow chart of a linear prediction model according to an embodiment of the present invention;
FIG. 4 is a training flowchart of a cross-screen pair prediction model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a device for identifying multiple devices of the same user, which is provided by the embodiment of the invention and is applied to a service server, wherein the service server is in communication connection with a third-party website server;
fig. 6 is a schematic structural diagram of a service server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to identify a plurality of electronic devices of the same user without user account information, the embodiment of the invention provides a method, a device, a service server and a storage medium for identifying a plurality of devices of the same user.
Referring to fig. 1, a method for identifying multiple devices of the same user provided by an embodiment of the present invention is applied to a service server, where the service server is communicatively connected to a third party website server, and as shown in fig. 1, a specific process flow of the method may include:
step S101, obtaining a plurality of pieces of user behavior data, where each piece of user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage.
In practice, the device identification information includes: and the mobile terminal equipment identification information is used for identifying the computer equipment, and is obtained when the user accesses the target webpage information of the third-party website.
For example: if the service server is an advertisement delivery server in the advertisement delivery system, the third party website may be a video website, a news website, a live broadcast website, a shopping website, or the like, where the advertisement location is set, that directly serves the user.
In practice, when the user accesses the target webpage of the third-party website, if the target webpage contains unoccupied advertisement positions, the third-party website server sends an advertisement request to each advertisement delivery server. The advertisement request may include device identification information, target web page information of the third-party website accessed by the user, a source IP address used to access the target web page, and time information for accessing the target web page. And after receiving the advertisement request, the advertisement putting server records the data in the advertisement request into the advertisement putting server.
Step S102, based on the device identification information in each piece of user behavior data, the source IP address used for accessing the target web page, and the time information for accessing the target web page, combining the device identification information accessed by the web page using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
Step S103, calculating the behavior correlation between the user behavior data corresponding to the two pieces of equipment identification information in each first candidate cross-screen pair.
Step S104, obtaining each first candidate cross-screen pair, wherein each first candidate cross-screen pair with the behavior correlation larger than a preset correlation threshold value is used as each second candidate cross-screen pair.
Step S105, respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair.
The cross-screen pair prediction model is obtained by training the initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample.
Step S106, obtaining each second candidate cross-screen pair, wherein the prediction probability of each second candidate cross-screen pair is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as a plurality of devices of the same user.
It can be seen that, in this embodiment, each piece of user behavior data includes: and finally identifying a plurality of devices of the same user by the device identification information, the target webpage information of the third-party website accessed by the user, the source IP address used for accessing the target webpage and the time information for accessing the target webpage. Therefore, by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
Fig. 2 is another flowchart of a method for identifying multiple devices of the same user and applying the multiple devices to a service server, where the service server is communicatively connected to a third-party website server, as shown in fig. 2, where the method includes:
step S201, obtaining a plurality of pieces of user behavior data, where each piece of user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage; the device identification information includes: and the mobile terminal equipment identification information is used for identifying the computer equipment, and is obtained when the user accesses the target webpage information of the third-party website.
For example: when the service server is an advertisement delivery server in the advertisement delivery system, it saves the user behavior data in the advertisement request received from the third-party website server each time into the database table of the advertisement delivery server. The database tables may be stored in chronological order.
An example of a database table that may be implemented may be as shown in table one:
list one
In this embodiment, a plurality of pieces of user behavior data may be obtained from the database.
In practice, cookie information is used to identify the computer device; the mobile terminal device identification is used to identify the mobile terminal device. The mobile terminal Device identification may be implemented as a Device ID, but in some cases, for example: in a mobile phone of a certain model, permission to obtain a Device ID can be set through permission setting, in this case, the mobile terminal Device is identified by Cookie information, but the mobile terminal Device is identified by the Cookie information in the user behavior data.
Step S202, based on the Cookie information or the mobile terminal device identification information in each piece of user behavior data, the source IP address used for accessing the target web page, and the time information for accessing the target web page, combining the Cookie information and the mobile terminal device identification information that use the same source IP address to access the web page in a first preset time period, and forming at least one first candidate cross-screen pair.
The first preset time period can be 7 days, and the equipment identification information accessed by the same source IP address for each day in the first preset time period can be combined in pairs and marked as a candidate cross-screen pair for each day; and taking two pieces of equipment identification information, which are marked as candidate cross-screen pairs for a plurality of days, in a first preset time period as a first candidate cross-screen pair. For example, two device identification information labeled as candidate cross-screen pairs for 5 days out of 7 are used as the first candidate cross-screen pair.
For example, in table one, in date 1, if both Cookie1 and mobile terminal device identifier 1 use IP2, then Cookie1 and mobile terminal device identifier 1 are used as a candidate cross-screen pair of date 1; if the Cookie1 and the mobile terminal equipment identifier 2 both use the IP3, taking the Cookie1 and the mobile terminal equipment identifier 2 as a candidate cross-screen pair of the date 1;
in the date 2, if the Cookie1 and the mobile terminal equipment identifier 1 both use the IP2, the Cookie1 and the mobile terminal equipment identifier 1 are used as a candidate cross-screen pair of the date 2;
and by analogy, acquiring a candidate cross-screen pair every day, and further acquiring a first candidate cross-screen pair by the method.
In some cases, a company may have many employees on their respective devices, using the same source IP address, but not the same user. So before this step, it is possible to: calculating the quantity of Cookie information and mobile terminal equipment identification information which use the same source IP address to surf the internet in one day; if the number exceeds the preset threshold, removing the user behavior data with the number exceeding the preset threshold, and combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period based on the Cookie information or the mobile terminal equipment identification information in each piece of the remaining user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage to form at least one first candidate cross-screen pair. For example, if the total number of Cookie information and mobile terminal device identification information using IP1 is 100 in one day, this part of the user behavior data is discarded. In this way, the data of the situation is removed, the used user behavior data is more accurate, and the accuracy of identifying a plurality of devices of the same user is further improved.
Further, the difference between the last time of the day and the first time of the day in the user behavior data of each day of the Cookie information and the mobile terminal equipment identification information in each first candidate cross-screen pair can be calculated as the internet surfing time of each day; calculating a ratio of the sum of the internet surfing time of each Cookie information in a preset number of days to the number of days of the Cookie information in the preset number of days as a first ratio, and a ratio of the sum of the internet surfing time of each mobile terminal equipment identification information in the preset number of days to the number of days of the mobile terminal equipment identification information in the preset number of days as a second ratio; and when the first ratio or the second ratio of the first candidate cross-screen pair is larger than a preset internet surfing time threshold, the first candidate cross-screen pair is used as a data error cross-screen pair.
That is, the internet surfing time length of each day of the Cookie information and the mobile terminal equipment identification information is calculated respectively, for example: cookie information appears for 5 days in 7 days, the sum of surfing time periods in the 5 days is 125, and then the first ratio is: 125 divided by 5 equals 25; the mobile terminal equipment identification information appears for 6 days in 7 days, the sum of the surfing time periods in the 6 days is 96, and the second ratio is: 96 divided by 6 equals 16. The surfing time period threshold may be 20 hours, in this example, 25 is greater than 20, and the data error cross-screen pair is removed from the first candidate cross-screen pair. In this way, the error data can be further removed, so that the possibility that the obtained first candidate cross-screen pair is the equipment of the same user is high, and the accuracy of identifying the cross-screen pair is improved.
Step S203, calculating the source IP address Jaccard coefficient and the time Jaccard coefficient of each first candidate cross-screen pair according to the source IP address used by each first candidate cross-screen pair for accessing the target webpage in the user behavior data and the time information for accessing the target webpage.
The source IP address jekade coefficient may be implemented as: the ratio of the number of source IP addresses in the intersection of the source IP address of the Cookie information and the source IP address of the mobile terminal equipment identification information to the number of source IP addresses in the union of the source IP address of the Cookie information and the source IP address of the mobile terminal equipment identification information.
wherein J is ip Is the source IP address jaccard coefficient.
For example: cookie1 and mobile terminal equipment identifier 1 are the first candidate cross-screen pair. Over 28 days, cookie1 used IP1, IP2, IP3, IP4, and IP8. The mobile terminal device identity 1 uses IP1, IP2 and IP9. Because the intersection is IP1 and IP2, the number is 2; the union is IP1, IP2, IP3, IP4, IP8 and IP9, and the number is 6. The source IP address jaccard coefficient is: one third.
The time jaccard coefficient is: the ratio of the number of dates in the intersection of the date of Cookie information occurrence and the date of mobile terminal device identification information occurrence to the number of dates in the union of the date of Cookie information occurrence and the date of mobile terminal device identification information occurrence.
wherein J is Time Is the time Jaccard coefficient.
For example: cookie1 and mobile terminal equipment identifier 1 are the first candidate cross-screen pair. On day 28, cookie1 appeared on the following date: day 1, 2, 3, 4, 5, 6, 7 and 11, the date on which the mobile terminal device identification information 1 appears is: day 1, 2, 3, 5, 6, 8, and 11, then the intersection of the dates is: day 1, 2, 3, 5, 6 and 11, the number of dates in the intersection is 6; the union of dates is: day 1, 2, 3, 4, 5, 6, 7, 8 and 11, 9. The time jaccard coefficient is: dividing 6 by 9 equals two thirds.
Step S204, calculating the ratio of the number of the source IP addresses of the Cookie information to the number of the source IP addresses of the Cookie information in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information in each piece of user behavior data, wherein the ratio of the number of the source IP addresses of the Cookie information and the number of the source IP addresses of the Cookie information in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information is used as a first IP parameter, the ratio of the number of the source IP addresses of the Cookie information and the number of the source IP addresses of the mobile terminal equipment identification information is used as a second IP parameter, the ratio of the number of the date of the Cookie information in the intersection of the date of the Cookie information and the date of the Cookie information in the intersection of the mobile terminal equipment identification information in the first time parameter, and the ratio of the number of the Cookie information in the date of the Cookie information in the mobile terminal equipment identification information in the second time parameter.
For example: cookie1 and mobile terminal equipment identifier 1 are the first candidate cross-screen pair. In 28 days, the Cookie1 uses IP1, IP2, IP3, IP4 and IP8, and the number of used IP addresses is 5; the mobile terminal device identifier 1 uses IP1, IP2, and IP9, and the number of IP addresses used is 3. The intersection is IP1 and IP2, and the number is 2. The first IP parameters are: 2 divided by 5 equals two fifths; the second IP parameters are: dividing 2 by 3 equals two thirds.
On day 28, cookie1 appeared on the following date: days 1, 2, 3, 4, 5, 6, 7 and 11, the number of dates appearing is 8; the date on which the mobile terminal device identification information 1 appears is: days 1, 2, 3, 5, 6, 8 and 11, the number of days appearing was 7. The intersection of dates is: day 1, 2, 3, 5, 6 and 11, the number of dates in the intersection was 6. The first time parameter is: dividing 6 by 8 equals three quarters; the second time parameter is: dividing 6 by 7 equals six sevenths.
Step S205, inputting the source IP address jaccard coefficient, the time jaccard coefficient, the first IP parameter, the second IP parameter, the first time parameter, and the second time parameter to a pre-trained linear prediction model, to obtain a behavior correlation between user behavior data corresponding to two device identification information in a first candidate cross-screen pair.
The linear prediction model is obtained by training an initial linear model in advance by using a sample source IP address Jaccard coefficient, a sample time Jaccard coefficient, a sample first IP parameter, a sample second IP parameter, a sample first time parameter and a sample second time parameter of a sample.
Step S206, obtaining each first candidate cross-screen pair, wherein each first candidate cross-screen pair with the behavior correlation larger than a preset correlation threshold value is used as each second candidate cross-screen pair.
Step S207, converting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair within a second preset time period into target data in a format which can be identified by an FM model or an FFM model according to a preset format conversion mode.
The second preset time period may be 7, 14, 28, etc. days.
In the implementation manner, the target webpage information may be converted into the feature mapping value according to the feature mapping relation table, and then the feature mapping value is converted into the target data in the format that can be identified by the FM model or the FFM model. An example of a feature mapping table that may be implemented is shown in table two:
Watch II
Target web page | Feature map value |
Target webpage 1 | 5 |
Target web page 2 | 3 |
As shown in table two, the target webpage 1 is converted into a feature mapping value and then is '5'; the target web page 2 is converted into a feature map value and then becomes "3". The table is merely illustrative, and the specific form of the feature mapping relation table is not limited.
The data can be converted into feature mapping values and then converted into Libsvm format data which can be identified by an FM model or an FFM model.
Step S208, inputting target data of each second candidate cross-screen pair into a pre-trained cross-screen pair prediction model.
The cross-screen pair prediction model is obtained by training the initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample.
In practice, the initial cross-screen pair prediction model is an FM model or an FFM model.
Step S209, obtaining the prediction probability of whether the output of the cross-screen pair prediction model is the real cross-screen pair.
Step S210, obtaining each second candidate cross-screen pair, wherein the prediction probability of each second candidate cross-screen pair is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as a plurality of devices of the same user.
In practice, the target cross-screen pair threshold may be 0.8, 0.85, and 0.9 equivalents.
In practice, if it is determined through the above steps: the Cookie1 and the mobile terminal equipment identifier 1 are a plurality of devices of the same user, and the Cookie1 and the mobile terminal equipment identifier 2 are also a plurality of devices of the same user, so that the Cookie1, the mobile terminal equipment identifier 1 and the mobile terminal equipment identifier 2 are indicated to be a plurality of devices of the same user.
It can be seen that by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
Moreover, in the embodiment of the invention, a first candidate cross-screen pair is obtained according to the source IP address; obtaining behavior correlation between user behavior data corresponding to two pieces of equipment identification information in each first candidate cross-screen pair based on the linear prediction model, and further screening the linear prediction model to obtain a second candidate cross-screen pair; target webpage information of a third-party website accessed by a user in user behavior data of each second candidate cross-screen pair is input into a cross-screen pair prediction model to obtain prediction probability output by the cross-screen pair prediction model, and then the second candidate cross-screen pair is screened to obtain a plurality of devices of the same user. In this way, the accuracy rate of identifying the cross-screen pairs is improved layer by layer, and the accuracy rate of multiple devices of the same user is higher.
In practice, the training procedure of the linear prediction model mentioned in the above embodiment may be referred to as fig. 3.
As shown in fig. 3, a training flowchart of a linear prediction model provided by an embodiment of the present invention may include:
step S301, obtaining each correct sample cross-screen pair and each wrong sample cross-screen pair.
In practice, the correct sample cross-screen pairs may be obtained in advance. And carrying out error combination on the correct sample cross-screen pairs, and obtaining samples which are not cross-screen pairs as error sample cross-screen pairs.
Step S302, obtaining a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, where each piece of sample user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage.
Step S303, calculating a sample source IP address Jaccard coefficient and a sample time Jaccard coefficient of each sample cross-screen pair according to the source IP address used by each sample cross-screen pair for accessing the target webpage in each user behavior data and the time information for accessing the target webpage.
Step S304, calculating the ratio of the number of the source IP addresses of the Cookie information to the number of the source IP addresses of the Cookie information in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information in each sample cross-screen pair as a sample first IP parameter, the ratio of the number of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of the source IP addresses of the mobile terminal equipment identification information as a sample second IP parameter, the ratio of the number of the dates of the Cookie information to the number of the dates of the Cookie information in the intersection of the dates of the Cookie information and the mobile terminal equipment identification information as a sample first time parameter, and the ratio of the number of the dates of the Cookie information to the number of the mobile terminal equipment identification information in the intersection of the Cookie information as a sample second time parameter.
Step S305, inputting the sample source IP address jaccard coefficient, the sample time jaccard coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter, and the sample second time parameter of each correct sample cross-screen pair and each incorrect sample cross-screen pair to a current initial linear model, so as to obtain a sample cross-screen pair behavior correlation between user behavior data corresponding to two device identification information in each sample cross-screen pair.
Step S306, calculating a loss value according to the sample cross-screen pair behavior correlation, whether the sample cross-screen pair real result is the cross-screen pair real result and a preset loss function.
Step S307, judging whether the current initial linear model is converged according to the loss value of the preset loss function.
The predetermined loss function may be:
wherein y is i Y 'is the real result of whether the sample cross-screen pair is the cross-screen pair or not' i For predicted sample cross-screen pair behavior correlation, n is the sample cross-screen pairThe number i indicates which sample cross-screen pair, and Loss is the value of the Loss function.
If the judgment result is no, that is, the current initial linear model is not converged, executing step S308; if the judgment result is yes, that is, the current initial linear model converges, step S309 is performed.
Step S308, adjusting model parameters of the current initial linear model. The process returns to step S305.
In practice, the model parameters may be adjusted using a gradient descent method.
In practice, the model function of the current initial linear model may be:
y=ωx+b;
wherein x is a vector formed by a sample source IP address Jacquard coefficient, a sample time Jacquard coefficient, a sample first IP parameter, a sample second IP parameter, a sample first time parameter and a sample second time parameter which are input into the model; ω and b are model parameters to be trained, where ω is: the sample source IP address Jiekaded coefficient, the sample time Jiekaded coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter and the vector formed by the coefficients before the sample second time parameter; y is the behavioral correlation.
Step S309, determining the current initial linear model as the trained linear prediction model.
Therefore, by applying the embodiment of the invention, the initial linear model can be trained to obtain a good linear prediction model, and the behavior correlation between the user behavior data corresponding to the two pieces of equipment identification information in each first candidate cross-screen pair is predicted, so that a second candidate cross-screen pair is obtained.
In the embodiment shown in fig. 3, a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair stored in advance have been acquired, and the training flow of the linear prediction model mentioned in the above embodiment may be referred to as fig. 4.
As shown in fig. 4, a training flowchart of a cross-screen pair prediction model provided by an embodiment of the present invention may include:
step S401, according to target webpage information of a third-party website accessed by a user in a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, converting the target webpage information into sample target data in a format which can be identified by an FM model or an FFM model according to a preset format conversion mode.
Step S402, sample target data of each sample cross-screen pair is input into a current initial cross-screen pair prediction model.
In practice, the current initial cross-screen pair prediction model is an FM model or an FFM model.
Step S403, obtaining a prediction probability of whether the current initial cross-screen pair prediction model outputs a cross-screen pair.
Step S404, calculating a loss value according to the prediction probability, whether each sample cross-screen pair is a real result of the cross-screen pair and a preset loss function.
In practice, the predetermined loss function may be:
wherein y is i Y 'is the real result of whether the sample cross-screen pair is the cross-screen pair or not' i For the prediction probability, n is the number of sample cross-screen pairs, i is the number of sample cross-screen pairs, and Loss is the value of the Loss function. Other loss functions may be used, as applicable, and are not specifically limited herein.
And step S405, judging whether the current initial cross-screen pair prediction model is converged according to a loss value of a preset loss function.
If the result of the judgment is no, that is, the current initial cross-screen pair prediction model is not converged, executing step S406; if the result of the determination is yes, that is, the current initial cross-screen converges on the prediction model, step S407 is executed.
Step S406, adjusting the model parameters of the current initial cross-screen pair prediction model. The process returns to step S402.
In practice, the model parameters may be adjusted using a gradient descent method.
In practice, the model function of the current initial cross-screen pair prediction model may be:
wherein w is 0 、w i Sum (v) i, v j ) The model parameters to be trained; x is x i And x j Target data of each sample of a sample cross-screen pair; where i and j represent the sample target data of which sample cross-screen pair this sample is.
Wherein (v) i, v j ) As hidden vectors, two multidimensional vectors v are represented i And v j Is a dot product of (a).
Step S407, determining the current initial cross-screen pair prediction model as a trained cross-screen pair prediction model.
Therefore, by applying the embodiment of the invention, the FM or FFM model can be trained to obtain a good cross-screen pair prediction model, and further, a plurality of devices of the same user can be identified under the condition that no user account information exists.
The device for identifying multiple devices of the same user provided by the embodiment of the invention is applied to a service server, and the service server is in communication connection with a third-party website server, as shown in fig. 5, and the device comprises:
a user behavior data obtaining unit 501, configured to obtain a plurality of pieces of user behavior data, where each piece of user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
A first candidate cross-screen pair forming unit 502, configured to combine, in a first preset period of time, device identification information that uses the same source IP address to access a web page, based on device identification information in each piece of user behavior data, a source IP address used to access the target web page, and time information for accessing the target web page, to form at least one first candidate cross-screen pair;
a behavior correlation calculating unit 503, configured to calculate a behavior correlation between user behavior data corresponding to two pieces of device identification information in each first candidate cross-screen pair;
a second candidate cross-screen pair obtaining unit 504, configured to obtain each first candidate cross-screen pair, where the behavior correlation is greater than a preset correlation threshold, as each second candidate cross-screen pair;
the prediction probability obtaining unit 505 is configured to input target webpage information of a third party website accessed by a user in user behavior data corresponding to two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period, to a pre-trained cross-screen pair prediction model, and obtain a prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
And the multiple device acquiring units 506 of the same user are configured to acquire each second candidate cross-screen pair, where the prediction probability is greater than a preset target cross-screen pair threshold, and each second candidate cross-screen pair is used as multiple devices of the same user respectively.
Optionally, the device identification information includes: the mobile terminal comprises Cookie information which is used for identifying computer equipment and is generated when a user accesses target webpage information of the third-party website, and mobile terminal equipment identification information which is used for identifying mobile terminal equipment and is obtained when the user accesses the target webpage information of the third-party website;
the first candidate cross-screen pair forming unit is specifically configured to:
based on the Cookie information or the mobile terminal equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
It can be seen that by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
The embodiment of the present invention also provides a service server, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:
obtaining a plurality of pieces of user behavior data, wherein each piece of user behavior data comprises: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
based on the equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the equipment identification information which uses the same source IP address for webpage access in a first preset time period, and forming at least one first candidate cross-screen pair;
calculating the behavior correlation between the user behavior data corresponding to the two device identification information in each first candidate cross-screen pair;
Acquiring each first candidate cross-screen pair, wherein the behavior correlation is greater than a preset correlation threshold value, and the first candidate cross-screen pair is used as each second candidate cross-screen pair;
respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair or not; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
and acquiring each second candidate cross-screen pair, wherein the prediction probability of each second candidate cross-screen pair is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as a plurality of devices of the same user.
It can be seen that by applying the embodiment of the invention, a plurality of devices of the same user can be identified under the condition that no user account information exists.
The service server may be an electronic device.
The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the electronic device and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the steps of any of the above methods of identifying multiple devices of the same user.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the methods of identifying multiple devices of the same user described in the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, electronic device, computer readable storage medium, and computer program product, which are substantially similar to method embodiments, the description is relatively simple, and reference is made to the section of the method embodiments for relevance.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
1. A method of identifying a plurality of devices of the same user, for use with a service server communicatively coupled to a third party website server, the method comprising:
obtaining a plurality of pieces of user behavior data, wherein each piece of user behavior data comprises: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
based on the equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the equipment identification information which uses the same source IP address for webpage access in a first preset time period, and forming at least one first candidate cross-screen pair;
calculating the behavior correlation between the user behavior data corresponding to the two device identification information in each first candidate cross-screen pair;
Acquiring each first candidate cross-screen pair, wherein the behavior correlation is greater than a preset correlation threshold value, and the first candidate cross-screen pair is used as each second candidate cross-screen pair;
respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair or not; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
and acquiring each second candidate cross-screen pair, wherein the prediction probability of each second candidate cross-screen pair is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as a plurality of devices of the same user.
2. The method of claim 1, wherein the device identification information comprises: the mobile terminal comprises Cookie information which is used for identifying computer equipment and is generated when a user accesses target webpage information of the third-party website, and mobile terminal equipment identification information which is used for identifying mobile terminal equipment and is obtained when the user accesses the target webpage information of the third-party website;
The step of combining the device identification information accessed by the web page by using the same source IP address within a preset time period based on the device identification information in each piece of user behavior data, the source IP address used for accessing the target web page and the time information for accessing the target web page, to form at least one first candidate cross-screen pair comprises the following steps:
based on the Cookie information or the mobile terminal equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
3. The method of claim 2, wherein the step of calculating a behavior correlation between user behavior data corresponding to two device identification information within each first candidate cross-screen pair comprises:
according to the source IP address used by each first candidate cross-screen pair for accessing the target webpage in the user behavior data and the time information for accessing the target webpage, calculating the source IP address Jaccard coefficient and the time Jaccard coefficient of each first candidate cross-screen pair;
Calculating the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the Cookie information in each first candidate cross-screen pair user behavior data as a first IP parameter, the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the mobile terminal equipment identification information as a second IP parameter, the ratio of the number of dates in the intersection of the dates of the Cookie information and the dates of the Cookie information as a first time parameter, and the ratio of the number of the dates of the Cookie information and the dates of the mobile terminal equipment identification information as a second time parameter;
inputting the source IP address Jaccard coefficient, the time Jaccard coefficient, the first IP parameter, the second IP parameter, the first time parameter and the second time parameter into a pre-trained linear prediction model to obtain the behavior correlation between user behavior data corresponding to two pieces of equipment identification information in a first candidate cross-screen pair; the linear prediction model is obtained by training an initial linear model in advance by using a sample source IP address Jaccard coefficient, a sample time Jaccard coefficient, a sample first IP parameter, a sample second IP parameter, a sample first time parameter and a sample second time parameter of a sample.
4. A method according to claim 3, wherein the initial cross-screen pair prediction model is an FM model or an FFM model;
the step of inputting the target webpage information of the third-party website accessed by the user in the user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in the second preset time period into the pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair, comprises the following steps:
converting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into target data in a format which can be identified by an FM model or an FFM model according to a preset format conversion mode;
inputting target data of each second candidate cross-screen pair into a pre-trained cross-screen pair prediction model;
and obtaining the prediction probability of whether the output of the cross-screen pair prediction model is the real cross-screen pair.
5. The method according to claim 4, wherein the linear prediction model is obtained by pre-training using the steps of:
acquiring each correct sample cross-screen pair and each wrong sample cross-screen pair;
Obtaining a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, wherein each piece of sample user behavior data comprises: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
according to the source IP address used by each sample cross-screen pair for accessing the target webpage in the user behavior data and the time information for accessing the target webpage, calculating the sample source IP address Jaccard coefficient and the sample time Jaccard coefficient of each sample cross-screen pair;
calculating the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the Cookie information in each sample cross-screen pair in each user behavior data as a sample first IP parameter, the ratio of the number of source IP addresses in the intersection of the source IP addresses of the Cookie information and the source IP addresses of the mobile terminal equipment identification information to the number of source IP addresses of the mobile terminal equipment identification information as a sample second IP parameter, the ratio of the number of dates in the intersection of the dates of the Cookie information and the dates of the Cookie information to the number of dates in the intersection of the dates of the Cookie information to the mobile terminal equipment identification information to be used as a sample first time parameter, and the ratio of the number of dates of the Cookie information to the mobile terminal equipment identification information to be used as a sample second time parameter;
Inputting the sample source IP address Jaccard coefficient, the sample time Jaccard coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter and the sample second time parameter of each correct sample cross-screen pair and each wrong sample cross-screen pair into a current initial linear model to obtain sample cross-screen pair behavior correlation between user behavior data corresponding to two pieces of equipment identification information in each sample cross-screen pair;
calculating a loss value according to the sample cross-screen pair behavior correlation, whether the sample cross-screen pair real result is a cross-screen pair real result or not and a preset loss function;
judging whether the current initial linear model converges or not according to a loss value of a preset loss function;
if yes, determining the current initial linear model as a trained linear prediction model; if not, the model parameters of the current initial linear model are adjusted, and the step of inputting the sample source IP address Jaccard coefficient, the sample time Jaccard coefficient, the sample first IP parameter, the sample second IP parameter, the sample first time parameter and the sample second time parameter of each correct sample cross-screen pair and each wrong sample cross-screen pair into the current initial linear model to obtain the sample cross-screen pair behavior correlation between the user behavior data corresponding to the two device identification information in each sample cross-screen pair is executed.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
the cross-screen pair prediction model is obtained by training in advance through the following steps:
according to target webpage information of a third-party website accessed by a user in a plurality of pieces of sample user behavior data of each correct sample cross-screen pair and each wrong sample cross-screen pair, converting the target webpage information into sample target data in a format which can be identified by an FM model or an FFM model in a preset format conversion mode;
sample target data of each sample cross-screen pair are input into a current initial cross-screen pair prediction model;
obtaining the prediction probability of whether the current initial cross-screen pair prediction model outputs a cross-screen pair or not;
calculating a loss value according to the prediction probability, whether each sample cross-screen pair is a real result of the cross-screen pair or not and a preset loss function;
judging whether the current initial cross-screen pair prediction model is converged or not according to a loss value of a preset loss function;
if yes, determining the current initial cross-screen pair prediction model as a trained cross-screen pair prediction model; if not, the model parameters of the current initial cross-screen pair prediction model are adjusted, and the step of inputting the sample target data of each sample cross-screen pair into the current initial cross-screen pair prediction model is carried out.
7. An apparatus for identifying a plurality of devices of a same user, the apparatus being adapted for use with a service server, the service server being communicatively coupled to a third party website server, the apparatus comprising:
a user behavior data obtaining unit, configured to obtain a plurality of pieces of user behavior data, where each piece of user behavior data includes: the method comprises the steps of equipment identification information, target webpage information of a third-party website accessed by a user, a source IP address used for accessing the target webpage and time information for accessing the target webpage;
the first candidate cross-screen pair forming unit is used for combining equipment identification information which uses the same source IP address to access the webpage in a first preset time period based on the equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, so as to form at least one first candidate cross-screen pair;
the behavior correlation calculation unit is used for calculating the behavior correlation between the user behavior data corresponding to the two pieces of equipment identification information in each first candidate cross-screen pair;
the second candidate cross-screen pair acquisition unit is used for acquiring each first candidate cross-screen pair, wherein the behavior correlation is larger than a preset correlation threshold value, and each first candidate cross-screen pair is used as each second candidate cross-screen pair;
The prediction probability obtaining unit is used for respectively inputting target webpage information of a third-party website accessed by a user in user behavior data corresponding to the two pieces of equipment identification information in each second candidate cross-screen pair in a second preset time period into a pre-trained cross-screen pair prediction model to obtain the prediction probability of whether the output of the cross-screen pair prediction model is a real cross-screen pair or not; the cross-screen pair prediction model is obtained by training an initial cross-screen pair prediction model in advance according to target webpage information of a third-party website accessed by a user in the user behavior data of the sample;
and the multiple device acquisition units of the same user are used for acquiring the second candidate cross-screen pairs, wherein the prediction probability of the second candidate cross-screen pairs is larger than a preset target cross-screen pair threshold value, and the second candidate cross-screen pairs are respectively used as multiple devices of the same user.
8. The apparatus of claim 7, wherein the device identification information comprises: the mobile terminal comprises Cookie information which is used for identifying computer equipment and is generated when a user accesses target webpage information of the third-party website, and mobile terminal equipment identification information which is used for identifying mobile terminal equipment and is obtained when the user accesses the target webpage information of the third-party website;
The first candidate cross-screen pair forming unit is specifically configured to:
based on the Cookie information or the mobile terminal equipment identification information in each piece of user behavior data, the source IP address used for accessing the target webpage and the time information for accessing the target webpage, combining the Cookie information and the mobile terminal equipment identification information which are accessed by using the same source IP address in a first preset time period, and forming at least one first candidate cross-screen pair.
9. The service server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911227587.9A CN111080349B (en) | 2019-12-04 | 2019-12-04 | Method, device, server and medium for identifying multiple devices of same user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911227587.9A CN111080349B (en) | 2019-12-04 | 2019-12-04 | Method, device, server and medium for identifying multiple devices of same user |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111080349A CN111080349A (en) | 2020-04-28 |
CN111080349B true CN111080349B (en) | 2023-04-21 |
Family
ID=70312791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911227587.9A Active CN111080349B (en) | 2019-12-04 | 2019-12-04 | Method, device, server and medium for identifying multiple devices of same user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111080349B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112559872A (en) * | 2020-12-21 | 2021-03-26 | 上海明略人工智能(集团)有限公司 | Method, system, computer device and storage medium for identifying user between devices |
CN114968970A (en) * | 2021-02-24 | 2022-08-30 | 北京国双千里科技有限公司 | Object attribute determination method and device, electronic equipment and storage medium |
CN114491315A (en) * | 2022-02-08 | 2022-05-13 | 联想(北京)有限公司 | Information processing method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105447148A (en) * | 2015-11-26 | 2016-03-30 | 上海晶赞科技发展有限公司 | Cookie identifier association method and apparatus |
CN105677844A (en) * | 2016-01-06 | 2016-06-15 | 北京摩比万思科技有限公司 | Mobile advertisement big data directional pushing and user cross-screen recognition method |
CN106445942A (en) * | 2015-08-05 | 2017-02-22 | 腾讯科技(北京)有限公司 | User cross-screen identification method and apparatus |
CN106528777A (en) * | 2016-10-27 | 2017-03-22 | 北京百分点信息科技有限公司 | Cross-screen user identification normalizing method and system |
CN108197190A (en) * | 2017-12-26 | 2018-06-22 | 北京秒针信息咨询有限公司 | A kind of method and apparatus of user's identification |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160162937A1 (en) * | 2014-12-05 | 2016-06-09 | Hitesh Chawla | Method and system for identifying users across multiple communication devices |
US10715612B2 (en) * | 2015-09-15 | 2020-07-14 | Oath Inc. | Identifying users' identity through tracking common activity |
US10679260B2 (en) * | 2016-04-19 | 2020-06-09 | Visual Iq, Inc. | Cross-device message touchpoint attribution |
-
2019
- 2019-12-04 CN CN201911227587.9A patent/CN111080349B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106445942A (en) * | 2015-08-05 | 2017-02-22 | 腾讯科技(北京)有限公司 | User cross-screen identification method and apparatus |
CN105447148A (en) * | 2015-11-26 | 2016-03-30 | 上海晶赞科技发展有限公司 | Cookie identifier association method and apparatus |
CN105677844A (en) * | 2016-01-06 | 2016-06-15 | 北京摩比万思科技有限公司 | Mobile advertisement big data directional pushing and user cross-screen recognition method |
CN106528777A (en) * | 2016-10-27 | 2017-03-22 | 北京百分点信息科技有限公司 | Cross-screen user identification normalizing method and system |
CN108197190A (en) * | 2017-12-26 | 2018-06-22 | 北京秒针信息咨询有限公司 | A kind of method and apparatus of user's identification |
Also Published As
Publication number | Publication date |
---|---|
CN111080349A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111080349B (en) | Method, device, server and medium for identifying multiple devices of same user | |
CN109872242B (en) | Information pushing method and device | |
CN109903086B (en) | Similar crowd expansion method and device and electronic equipment | |
US20140310691A1 (en) | Method and device for testing multiple versions | |
CN108390788B (en) | User identification method and device and electronic equipment | |
CN108965951B (en) | Advertisement playing method and device | |
CN107305611B (en) | Method and device for establishing model corresponding to malicious account and method and device for identifying malicious account | |
CN111783810B (en) | Method and device for determining attribute information of user | |
CN109165691B (en) | Training method and device for model for identifying cheating users and electronic equipment | |
CN108062418B (en) | Data searching method and device and server | |
CN108335131B (en) | Method and device for estimating age bracket of user and electronic equipment | |
CN111125521A (en) | Information recommendation method, device, equipment and storage medium | |
US10049369B2 (en) | Group targeting system and method for internet service or advertisement | |
CN110022259B (en) | Message arrival rate determining method and device, data statistics server and storage medium | |
CN107885875B (en) | Synonymy transformation method and device for search words and server | |
CN112115169B (en) | User portrait generation, object distribution and content recommendation methods, devices and media | |
CN111080374A (en) | Test method of advertisement delivery strategy, bidding server and advertisement delivery system | |
CN112836128A (en) | Information recommendation method, device, equipment and storage medium | |
CN109740623B (en) | Actor screening method and device | |
CN108647986B (en) | Target user determination method and device and electronic equipment | |
CN112883275B (en) | Live broadcast room recommendation method, device, server and medium | |
CN111597380B (en) | Recommended video determining method and device, electronic equipment and storage medium | |
CN110996142B (en) | Video recall method and device, electronic equipment and storage medium | |
CN110727895B (en) | Sensitive word sending method and device, electronic equipment and storage medium | |
CN110442801B (en) | Method and device for determining concerned users of target events |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |