CN111400575A - User identification generation method, user identification method and device - Google Patents

User identification generation method, user identification method and device Download PDF

Info

Publication number
CN111400575A
CN111400575A CN202010190953.4A CN202010190953A CN111400575A CN 111400575 A CN111400575 A CN 111400575A CN 202010190953 A CN202010190953 A CN 202010190953A CN 111400575 A CN111400575 A CN 111400575A
Authority
CN
China
Prior art keywords
page
position data
user
cursor
user identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010190953.4A
Other languages
Chinese (zh)
Other versions
CN111400575B (en
Inventor
付星昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010190953.4A priority Critical patent/CN111400575B/en
Publication of CN111400575A publication Critical patent/CN111400575A/en
Application granted granted Critical
Publication of CN111400575B publication Critical patent/CN111400575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses user identification generation method, user identification method and device, through judging whether the page that shows at present is content type typesetting page, work as the page is content type typesetting page, acquires the cursor and is in stay position data in the page and obtain the target position data, according to the coordinate generation user identification that the target position data obtained, because the composing design of content type typesetting page is with showing the text content for the purpose, the user has specific action habit when browsing this type of typesetting page, therefore the target position data can accurately reflect the action habit of user when browsing the webpage, through the user identification that the target position data generated, can directly be associated with user's action habit to do not rely on the static information of browser, through user identification can improve user identification's accuracy.

Description

User identification generation method, user identification method and device
Technical Field
The embodiment of the application relates to the technical field of internet, in particular to a user identifier generation method, a user identification device, a terminal, a server and a computer readable storage medium thereof.
Background
Currently, a website or an advertisement alliance identifies a terminal by using a terminal identification technology, and the terminal identification technology can associate all behaviors of a user browsing a webpage through a browser, so that each individual can be accurately positioned on the network, data of the individuals can be collected, and personalized services or other targeted activities can be realized through data analysis.
However, the existing terminal identification technology can only locate the terminal through the static information of the browser, and cannot accurately identify the user.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The application provides a user identification generation method, a user identification device, a terminal, a server and a computer readable storage medium thereof, which can improve the accuracy of user identification.
According to a first aspect of the present application, there is provided a user identifier generating method, including:
judging whether the currently displayed page is a content type typesetting page or not;
when the page is a content type typesetting page, acquiring staying position data of a cursor in the page, wherein the staying position data is position data of the cursor staying in the page for a time period longer than a first threshold value;
obtaining target position data according to the staying position data of the cursor in the page;
and generating a user identifier for identifying the user according to the coordinates obtained by the target position data.
According to a second aspect of the present application, there is provided a user identification method, comprising,
the method comprises the steps that a user identification from a terminal is obtained, the user identification is generated according to a target stop point of the terminal, and the target stop point is obtained by screening a stop point set generated when a cursor stops in a webpage and is used for representing behavior habits of a user in browsing the webpage;
matching the user identification with a preset user identification feature library;
and when the matching is successful, the identification of the terminal user is completed.
According to a third aspect of the present application, there is provided a user identification generation apparatus, comprising,
the judging module is used for judging whether the currently displayed page is a content type typesetting page or not, wherein a content display area of the content type typesetting page is positioned in the middle of the content type typesetting page along the width direction, and a blank area is arranged between two sides of the content display area along the width direction and the edge of the content type typesetting page;
the position data acquisition module is used for acquiring first position data of a cursor in the content type typesetting page when the page is the content type typesetting page, wherein the first position data is position data of the cursor staying time length in the content type typesetting page longer than a first threshold value;
the target position data generation module is used for obtaining target position data according to the staying position data of the cursor in the page;
and the identification generation module is used for generating a user identification for identifying the user according to the coordinates obtained by the target position data.
According to a fourth aspect of the present application, there is provided a user identification apparatus comprising:
a user identifier receiving module, configured to obtain a user identifier generated by the user identification apparatus according to the fourth aspect of the present application;
the matching module is used for matching the user identification in a preset user identification feature library;
the confirmation module is used for completing the identification of the terminal user after the matching is successful;
and the newly-built module is used for storing the user identifier serving as a new user identifier feature into the user identifier feature library after the matching fails.
According to a fifth aspect of the present application, there is provided a user identifier generating apparatus, comprising:
at least one memory;
at least one processor;
at least one program;
said programs being stored in a memory and said at least one program being executed by a processor to implement the subscriber identity generation method of the first aspect.
According to a sixth aspect of the present application, there is provided a user identification apparatus comprising:
at least one memory;
at least one processor;
at least one program;
the program is stored in a memory and a processor executes the at least one program to implement the user identification method of the second aspect.
According to a seventh aspect of the present application, there is provided a terminal, including the user identifier generating apparatus of the fifth aspect of the present application.
According to an eighth aspect of the present application, there is provided a server comprising the user identification device of the sixth aspect of the present application.
According to a ninth aspect of the present application, there is provided a computer-readable storage medium storing computer-executable instructions for performing the user identification generation method of the first aspect of the present application, or the user identification method of the second aspect of the present application.
The technical scheme that this application provided, whether the page through judging the present demonstration is content type typesetting page, works as the page is content type typesetting page, acquires the cursor and is in stay position data in the page obtains the target position data, according to the coordinate generation user identification that the target position data obtained, because the typesetting design of content type typesetting page is with showing the text content as the purpose, the user has specific action habit when browsing this type of typesetting page, therefore the action habit of user when browsing the webpage can accurately be reflected to the target position data, through the user identification that the target position data generated, can directly be associated with user's action habit to do not rely on the static information of browser, through user identification can improve the accuracy of user identification.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a schematic diagram of a prior art browser fingerprint generation process;
FIG. 2 is a comparative schematic of a prior art browser fingerprint generation process;
FIG. 3 is a system architecture diagram of an environment for implementing a user identification generation method provided by an exemplary embodiment of the present application;
FIG. 4 is a flow chart of a method for generating a user identification provided by an exemplary embodiment of the present application;
FIG. 5 is a diagram of a content layout page;
FIG. 6 is a flowchart of a specific method of step 401 in FIG. 4;
FIG. 7 is a flowchart illustrating another embodiment of the method of step 401 in FIG. 4;
FIG. 8 is a flowchart illustrating another embodiment of the method of step 401 in FIG. 4;
fig. 9 is a schematic diagram of a track formed when a cursor moves in a user identifier generation method according to an exemplary embodiment of the present application;
FIG. 10 is a flowchart of a method specific to step 402 of FIG. 4;
FIG. 11 is a flowchart illustrating a specific method for acquiring data of a staying position of a cursor in a page in step 402 in FIG. 4;
FIG. 12 is a flowchart of a method of one embodiment of step 403 of FIG. 4;
FIG. 13 is a flowchart of a method of another embodiment of step 403 of FIG. 4;
FIG. 14 is a diagram of another embodiment of a content layout page;
FIG. 15 is a diagram of another embodiment of a content layout page;
FIG. 16 is a flowchart of a particular method of step 404 of FIG. 4;
FIG. 17 is a schematic diagram of a centroid calculation method provided by an exemplary embodiment of the present application;
FIG. 18 is a flowchart of a user identification generation method provided by an exemplary embodiment of the present application;
FIG. 19 is a flowchart of a user identification generation method provided by an exemplary embodiment of the present application;
FIG. 20 is a flowchart of a detailed method of step 1902 of FIG. 19;
FIG. 21 is a flow chart of a method for user identification provided by an exemplary embodiment of the present application;
FIG. 22 is a flowchart of a method for generating a user identifier according to an exemplary embodiment of the present application;
FIG. 23 is an interface diagram of an application scenario provided by an exemplary embodiment of the present application;
fig. 24 is a block diagram illustrating a structure of a user identifier generating apparatus according to an exemplary embodiment of the present application;
fig. 25 is a block diagram of a user identification device according to an exemplary embodiment of the present application;
FIG. 26 is a block diagram of another embodiment of a subscriber identity generation apparatus according to the present application;
FIG. 27 is a block diagram of another embodiment of a subscriber identity device according to the present application;
fig. 28 is a block diagram of a terminal according to an exemplary embodiment of the present application;
fig. 29 is a block diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
First, several terms referred to in the present application are resolved:
the user: the user is a natural person operating the terminal, and can operate the computer device according to the own operation habit, and unless the situation is specifically stated, the user in the embodiment of the present application represents the same natural person.
The browser is a browser (Application) which can display the file content of a HyperText Markup language (HTM L) provided by a web server or a file system and can realize the interaction between a user and the file.
Browser fingerprint: based on the visible configuration information and setting information of the browser itself, such as kernel information, time zone, language, and resolution of the screen of the browser, the user performs a certain algorithm processing based on the above information to obtain a character string.
Static information of the browser: the browser configuration information that does not change with the user change includes the feature identifier of the browser, such as the hardware type, the operating system, the user agent, the system font, the language, the screen resolution, the browser plug-in, the browser extension, the browser setting, the time zone difference, and the like, and also includes the processing result of the data when the browser displays the page, such as the processing result of the display picture, the playing audio and video to the data stream.
Cursor: a visual graphic is provided to facilitate user interaction with the content displayed on the screen, and a cursor may be controlled by a peripheral device, such as a mouse, touchpad, etc. The user can control the cursor to move through the peripheral device so as to change the position of the display content, or the peripheral device sends an operation instruction to the cursor so as to enable the display content to execute corresponding action.
Typesetting: the whole pattern is formed by arranging elements such as characters, pictures, frames and the like according to a certain position.
Page: a carrier, such as a web page, document, etc., for presenting information to a user and enabling interaction with the user.
Content display area: and densely distributed contents such as characters or pictures.
And (4) white areas: the blank area without additional elements is not specifically defined as the area with white color, and the blank area may surround the elements in the page or be located between the elements in the page. For example, a margin area of a certain size may be formed by setting the outer distance of elements in a web page.
Content type typesetting page: the page layout mainly aims at displaying text content. The content display area of the content type typesetting page is positioned in the middle of the page, and other navigation guiding elements are fewer, so that a larger margin area exists between the two sides of the content display area in the width direction and the edge of the page, and the main attention of a user lies in browsing the content display area of the current page.
Distribution density: the number of target objects in a unit area.
And (3) user identification: during the interaction between the server and the user, the user identity can be identified so that the action performed by the user can be correctly attributed to the user, and the user identifier can be stored in the form of a character string, for example.
Currently, a website or an advertisement alliance identifies a terminal by using a terminal identification technology, a certain algorithm is carried out on the basis of the information by acquiring the setting, kernel information, time zone, language, resolution information and the like of a browser used by a user to obtain a value, the value is the fingerprint of the browser, the website or the advertisement alliance can be positioned to an individual user by the fingerprint of the browser, the data of the individual user can be collected, and personalized services or other targeted activities can be realized by data analysis.
Browser fingerprints can be implemented in different ways, for example, based on the characteristic identification of the browser, such as hardware type, operating system, user agent, system font, language, screen resolution, browser plug-in, browser extension, browser setting, time zone difference, etc., the fingerprint information has low accuracy, and has a high probability of collision, such as height, age, etc., of "similar" human.
At present, more accurate identification can be realized by using a browser fingerprint, such as a Canvas fingerprint (which is generated based on CRC check information of a picture displayed by a browser), because the same HTM L5 (Hypertext Marked L with 5, i.e., a fifth generation Hypertext markup language) drawing element drawing operation generates not identical picture contents on different operating systems and different browsers.
However, the fingerprints are performed based on browsers, and different browsers in the same device generate different fingerprint information, so that as a result, when the same User uses different browsers in the same computer, the browser fingerprint information collected by the service party is different, the User cannot be uniquely identified, and further the behavior of the User cannot be effectively analyzed, for example, as shown in fig. 1, when the User operates a browser a, the User Agent (UA) based on the browser a, the time zone information, the browser system platform information, the screen resolution, and the Canvas fingerprint (Canvas) information are processed by the feature value algorithm to obtain the browser fingerprint a: 2ds234vdg 345. However, for the same user as shown with reference to FIG. 2, when the user operates another browser B, since the user agent, time zone information, browser system platform information, screen resolution, and canvas fingerprint information have changed, the browser fingerprint B obtained by the eigenvalue algorithm process is: 53g721635 q. At this time, the browser fingerprint B cannot correspond to the browser fingerprint a, and the user cannot be uniquely identified due to different browser fingerprint information collected by the service party, so that the behavior of the user cannot be effectively analyzed.
In addition, for the same browser, the user can also intentionally modify the device information of the browser and randomly generate new device information every time of access, so that a new browser fingerprint is generated every time of access, and the website or the advertising alliance is difficult to accurately identify the user.
In summary, the existing browser fingerprint technology can only locate the terminal through the static information of the browser, and does not establish a relationship with the user, so that the user cannot be identified accurately.
Therefore, the embodiment of the application provides a user identifier generation method, a user identification method, a device, a terminal, a server and a computer readable storage medium thereof, which can generate a user identifier convenient for accurately identifying a user, wherein the user identifier does not depend on static information of terminal equipment, and even if a user changes a browsing program or the terminal, the user can be accurately identified by using the user identifier.
The user identifier generation method provided by the embodiment of the present application may be applied to an application environment shown in fig. 3, where the application environment includes: the terminal 11, the server 12, and the communication network 13, wherein the terminal 11 and the server 12 are connected to each other through the communication network 13, and the connection may be a wired connection or a wireless connection.
Illustratively, the terminal 11 may be an electronic terminal device such as a desktop computer, a notebook computer, a smart phone, a tablet computer, an e-book reader, etc., the terminal 11 includes a display device, the display device may be an independent display, such as an external display, the external display may be connected to the electronic terminal device such as a desktop computer, a notebook computer, a smart phone, a tablet computer, etc., so as to display or expand the display data of the terminal 11, and the display may also be a display integrated with the terminal 11, such as a built-in screen of a notebook computer, a smart phone, a tablet computer; the terminal 11 further includes a pointing device, which may be a built-in pointing device, such as a touch pad on a notebook computer, or a device that is externally disposed on the terminal 11 and is in communication connection with the terminal, such as a mouse, an external touch pad, or a keyboard with a touch pad. An Operating System (OS), at least one browsing program (App), and at least one execution program are installed in the terminal 11; the operating system may be a Microsoft operating system (Microsoft Windows OS) or apple operating system (Mac OS), etc.; a display program for displaying a cursor on a display device by a user is arranged in the operating system, and the user controls the position change of the cursor on the display device through a pointer device.
The browsing program may be a browser, an instant messaging software, a shopping software, a news software, a document processing software, and the like, wherein the browsing program has a browsing window through which a user reads or views page contents in the window, wherein the page contents may be web page contents provided by the server 12, or local page contents opened by the terminal 11, such as a local document, a cached local document of a web page, and the like.
The execution program can be a built-in function module of a browsing program, such as a built-in software function module of a browser and document processing software, or can be a plug-in of a third-party browsing program, such as a browser plug-in which a user actively installs in the browser or a plug-in which the user automatically installs to the browser when browsing a webpage, or can be independent software relative to the browsing program.
The servers 12 may be in the form of a single server or a cluster, i.e., one or more servers, and the servers 12 may be set by, for example, a web page provider, a browser provider, or an advertisement provider.
The terminal 11 is connected with the server 12 through a communication network 13, for example, the communication network 13 may be a wired network or a wireless network, and the wired network may be a metropolitan area network, a local area network, an optical fiber network, or the like; the wireless network may be a mobile communication network or a wireless fidelity network, etc.
Illustratively, a user opens a browse window of a browse program in the terminal 11, taking browsing a web page as an example, an operating system in the terminal 11 sends an access request to the server 12 through the communication network 13, the server 12 responds and returns web page data to the terminal 11 through the communication network 13 and displays the web page data in the browse window of the browse program, and the user operates and operates the browse page through a pointer device, such as moving, switching pages, dragging, clicking, double clicking, scrolling, and the like. When the page browsed by the user currently is the content type typesetting page, the executive program obtains the staying position data of the cursor in the page, processes the staying position data to obtain target position data, and generates a user identifier for identifying the user according to the coordinates of the target position data. The execution program of the terminal 11 sends the user identifier to the server 12 through the communication network 13, and the server 12 identifies the user according to the user identifier.
When the browsing program is a browser or an embedded browsing window and the page is a local document or a cached local document of a web page, the terminal 11 reads the document data in the internal storage device or the external storage device and displays the page corresponding to the document data in the browsing window.
Fig. 4 is a flowchart of a user identifier generating method according to an exemplary embodiment of the present application, where the method is applied to a terminal installed with a browsing program, and the method specifically includes step 401, step 402, step 403, and step 404.
Step 401, determine whether the currently displayed page is a content type typesetting page.
In the process of browsing the page by the user, the executive program judges the type of the current page so as to judge whether the currently displayed page is a content type typesetting page. In an embodiment, referring to fig. 5, a content display area 501 of the content type typesetting page is located in the middle of the page, and a larger margin area 502 exists between two sides of the content display area 501 in the width direction and the edge of the page, at this time, a user mainly focuses on browsing the content display area of the current page, where the staying positions of the cursor on the page are habitually distributed, when the user browses the page of the type typesetting type, in order to facilitate pulling the progress bar or avoiding the influence of the cursor on reading of the content display area, more of the cursor stays in the margin area, and the distribution of the staying positions of the cursor in the margin area is more regular.
Step 402, when the page is a content type typesetting page, obtaining the staying position data of the cursor in the page, wherein the staying position data is the position data of the cursor staying time length in the page longer than a first threshold value.
When the execution program judges that the page browsed by the user is the content type typesetting page, the position of the cursor in the page is obtained, when the duration of the cursor kept at the same position is longer than a first threshold value, the cursor stops at the current position, and the execution program identifies the stopping positions of the cursor to obtain stopping position data. For example, the cursor is an arrow, the point at the forefront of the arrow is the current position of the cursor, when the staying time of the cursor at the position is longer than a first threshold, the position data of the point is recorded and acquired by the execution program, at this time, the point can be visually regarded as a staying point on the page, when the user browses the page, the staying point is continuously acquired by the execution program, the execution program forms staying position data according to the acquired position data where the cursor stays, the staying position data reflects a position data set where the staying time of the cursor in the content type typesetting page is longer than the first threshold, and can also be regarded as a set of the staying points. Because the user has fixed behavior habits when browsing the content type typesetting page, the distribution of the staying positions of the cursor on the content type typesetting page can objectively reflect the behavior habits of the user.
And 403, obtaining target position data according to the staying position data of the cursor in the page.
The execution program identifies and obtains the data of the staying position of the cursor in the page, which is equivalent to screening the data of the staying position from the moving data of the cursor, and the execution program further needs to further process the data of the staying position in the page to obtain target position data.
And step 404, generating a user identifier for identifying the user according to the coordinates obtained by the target position data.
The target position data at least comprises coordinates of the staying positions of the cursors in the page, and the executive program can obtain the distribution condition of the staying positions of the cursors by analyzing the coordinates of the staying positions, namely the coordinate distribution of each cursor staying point in the page, and the user identification is obtained by converting the coordinate distribution.
The user identification generation method provided by the embodiment of the application, whether the page displayed at present is the content type typesetting page or not is judged, when the page is the content type typesetting page, the cursor is obtained by the stay position data in the page and the target position data is obtained, the user identification is generated according to the coordinates obtained by the target position data, because the typesetting design of the content type typesetting page aims at displaying the text content, the user has specific behavior habits when browsing the type of typesetting page, the target position data can accurately reflect the behavior habits of the user when browsing the page, the user identification generated by the target position data can be directly associated with the behavior habits of the user and does not depend on the static information of a browser, even if the user changes a browsing program or changes computer equipment, because the operation habits of the user when browsing the content type typesetting page are consistent, therefore, the user identifications generated by different browsing programs and even different computer devices are consistent or related, the relevance between the user identification generated by the embodiment of the application and the user is strong, and the accuracy of user identification can be improved by using the user identification.
In one embodiment, referring to fig. 6, step 401 further comprises:
step 601, judging whether the access address of the page is matched with a preset page address;
step 602, if the page is matched, the page is a content type typesetting page;
step 603, if not, the page is a non-content type typesetting page.
In an embodiment, the execution program determines whether the access address of the page matches a preset page address to determine whether the page is a content-based typesetting page, if so, the page is a content-based typesetting page, and if not, the page is a non-content-based typesetting page. Specifically, a preset list library may be established, and the list library stores the page addresses of the known content type typesetting pages. The preset page address can be stored on the server or locally. By the method, whether the page is the content type typesetting page can be quickly and intuitively judged.
In one embodiment, a maintainer or a background server executing a program enters a content type typesetting page according to the existing experience, the entering accuracy is high through a manual judgment mode, and the setting can be performed aiming at some commonly used pages, so that the realization efficiency is high.
In another embodiment, the list library may be updated in an automatic identification manner of the system, for example, a background server of the application automatically captures a page to analyze the layout of the page, and updates the access address corresponding to the page determined as the content type layout into the list library.
In one embodiment, in addition to the page address, it may be determined whether the current page is a content type composition page by combining a title, a brief introduction, or a keyword.
In another embodiment, referring to fig. 7, step 401 further comprises:
step 701, identifying a content display area in the page;
step 702, if the content display area is located in the middle of the page, and a blank area with a width occupying a page width ratio larger than a second threshold exists between two sides of the content display area along the width direction and the edge of the page, the page is a content type typesetting page.
Taking this page as a web page as an example, referring to fig. 5, a content display area 501 in the page is first identified, elements such as characters, pictures, or frames of the page and positions thereof may be first identified, and an area where contents such as characters or pictures are densely distributed is identified as the content display area 501. The position and width of the margin region 502 are obtained by combining the outer margin and the inner margin, and then whether the page is a content type typesetting page can be judged.
In an embodiment, the determining whether the page is a content type typesetting page may specifically be identifying a content display area in the page, and if the content display area is located in the middle of the page and a margin area exists between two sides of the content display area in the width direction and an edge of the page, the margin area having a width greater than a second threshold in the page width ratio, the page is a content type typesetting page. Illustratively, the value of the second threshold is 20% to 30%, and it is understood that the value of the second threshold is merely illustrative, and the actual value may also take other suitable ranges. By the method, whether the page is the content type typesetting page can be accurately and stably judged.
In addition, in an embodiment, whether the page is a content type typesetting page may also be determined by combining the two determination manners, as shown in fig. 8, step 401 further includes:
step 801, judging whether an access address of a page is matched with a preset page address;
step 802, if the page is matched with the content type typesetting page, the page is a content type typesetting page;
step 803, if not, identifying a content display area in the page;
step 804, if the content display area is located in the middle of the page, and a margin area with a width larger than a second threshold in the page width ratio exists between two sides of the content display area along the width direction and the edge of the page, the page is a content type typesetting page.
When judging whether a page is a content type typesetting page, judging by adopting a preset page address, if the preset page address is not matched with the preset page address, judging by combining the content display area and the margin area, quickly and intuitively judging whether the page is the content type typesetting page, and improving the accuracy and the stability of the judgment.
In an embodiment, each time a content type typesetting page is judged by the above method, the content type typesetting page is added into the business form library to enlarge the number of the preset addresses in the business form library, and repeated matching in each judgment can be avoided, so that the judgment efficiency is improved.
For example, when the content type typesetting page is a web page, the address of the page may be stored in the business form library, and when it is determined whether a page is a content type typesetting page, the address of the page may be obtained first, and then the business form library is searched for whether a preset address matches the address, if so, the page is a content type typesetting page.
When a user browses a page by using a browsing program, a series of position data can be generated along with the movement of a cursor, and the series of position data can form a track of the movement of the cursor. The staying time of the cursor at some position data may be longer than the first threshold, and the position data is the staying position data of the cursor, for example, fig. 9 is a schematic diagram of a track formed when the cursor moves, and the point a, the point B, the point C, and the like are the staying position data of the cursor. The stopping position data may correspond to coordinates, for example, a coordinate system may be established with an upper left corner of a window of the page as an origin, and the coordinates corresponding to the stopping position data may be obtained. Illustratively, the longer the dwell time of the dwell position data, the larger the radius of the corresponding point may also be set, making the properties of the dwell position data more visible.
Based on this, referring to fig. 10, in an embodiment, the step 402 includes steps 1001 and 1002.
Step 1001: collecting coordinates of a cursor in a page;
for example, when coordinates are collected, the page may be a web page, the page may be opened by using a browser, and the action of collecting the coordinates may be completed by an execution program of the browser; or the page may be opened by a browsing window embedded in other software, such as instant messaging software with a browsing window, document processing software, etc., and the action of acquiring the coordinates may be performed by an execution program of the software with a browsing window. Alternatively, the page may be a document processing page, such as a WORD document, a PDF document, etc., which may be opened using document processing software, and the act of capturing coordinates may be performed by an executing program of the document processing software.
In one embodiment, the collected coordinates may be coordinates of a cursor in the same page. In another embodiment, the collected coordinates may be coordinates of a cursor in a different page, i.e. when the user switches to browse the page, the collection process is not stopped or reset accordingly.
Step 1002: and if the stay time of the cursor at the coordinate is longer than a first threshold value, taking the position data corresponding to the coordinate as the stay position data of the cursor.
The value of the first threshold may be determined according to an actual requirement, for example, the first threshold may be 1 second, that is, the dwell time of the cursor at a certain coordinate is longer than 1 second, and the position data corresponding to the coordinate is the dwell position data of the cursor.
By collecting the coordinates of the cursor in the page and judging the length of the stay time of the coordinates, the stay position data of the cursor in the page can be simply and conveniently acquired.
In an embodiment, the staying time duration may be determined by a time difference between two adjacent coordinates, and therefore, as shown in fig. 11, the acquiring of the staying position data of the cursor in the page in step 402 includes, but is not limited to, the following steps:
step 1101: responding to the movement of the cursor, and acquiring a plurality of continuous coordinates and corresponding time points of the cursor in the moving process of the page;
for example, when a user browses a web page, a native event mechanism of a browser, for example, a Mouse Move event, may be used, whether a cursor moves is determined by using whether the Mouse Move event triggers or not, and when the cursor moves, coordinates and a time point of a current cursor are recorded, where the movement of the cursor is in units of pixels, and the Mouse Move event is triggered once every time the cursor moves by one pixel, and the native event mechanism may be executed by an execution program of the browser; when the user uses other application programs, such as the document processing software, the continuous collection of the cursor coordinate can also be realized by using the executive program of the document processing software. Illustratively, the collection of the cursor coordinates and the time point can be realized by using a js (javascript) script, and can also be realized by using other modes, such as a vb (visual basic) language and the like.
Step 1102: and if the time difference between two adjacent coordinates is larger than a first threshold value, using the position data corresponding to the prior coordinate in the two adjacent coordinates as the staying position data of the cursor.
Specifically, the time difference between two adjacent coordinates is the dwell time of the previous coordinate, and may be obtained by subtracting the time point corresponding to the previous coordinate from the time point corresponding to the next coordinate, for example, the time point of the previous coordinate is 12:00:00, and the time point of the next coordinate is 12:00:30, and then the time difference between the two adjacent coordinates is 30 seconds. In addition, the value of the first threshold may be determined according to actual requirements, for example, the first threshold may be 1 second, that is, the dwell time of the cursor at a certain coordinate is longer than 1 second, and the position data corresponding to the coordinate is the dwell position data of the cursor.
In addition to subtracting the time point corresponding to the previous coordinate from the time point corresponding to the subsequent coordinate, so as to obtain the dwell time of the previous coordinate, in an embodiment, the collection period of the coordinate may be set, and the dwell time of the coordinate may be obtained by accumulating the number of times of continuously collecting the same coordinate and then multiplying the number of times by the collection period.
In one embodiment, by acquiring a plurality of continuous coordinates and corresponding time points in the moving process of the cursor in the page, and obtaining the time difference between two adjacent coordinates according to the time points, the staying time of the position data corresponding to the previous coordinate can be obtained, and the staying time of the coordinate is judged, so that the staying position data of the cursor in the page can be simply and conveniently obtained.
In addition, when the cursor continuously moves, the number of the finally acquired coordinates and the number of the time points may be large, and the requirement on the computing capacity of the terminal is also improved. Therefore, in an embodiment, in response to the movement of the cursor, a plurality of continuous coordinates and corresponding time points of the cursor in the moving process in the page may be acquired according to a preset acquisition rate.
Specifically, the collection rate is the number of times of collecting the cursor coordinates and the time point in the unit time. The method comprises the steps of presetting a unit time, and stopping collecting the cursor coordinates when the number of times of collecting the cursor coordinates in the unit time reaches an upper limit. For example, the function throttling may be implemented by setting a unit time to 1 second (i.e., 1000 milliseconds) in a JS script, and when wait in a function thratt (fn, wait) takes 50 milliseconds, that is, performing the fn function 20(1000/50 ═ 20) times at most in 1000 milliseconds, so as to limit the number of times the fn function is performed in the unit time, and the fn function may be a function for achieving cursor coordinate collection. It can be understood that the values of the parameters of the unit time and the function throw can be adjusted according to actual conditions. Alternatively, the coordinate acquisition speed may be limited by setting the length of the coordinate acquisition cycle in advance. The coordinates of the cursor are continuously acquired according to the preset track acquisition rate, the acquisition performance can be optimized, and the stability of cursor coordinate acquisition is ensured.
In addition, different window sizes for a page, such as different window sizes for a page due to different screen specifications, may result in different coordinates being collected at the same location on the same page. Thus, in one embodiment, the collected coordinates refer to a proportional value of the position of the cursor relative to the window size of the page.
Specifically, the collected coordinates are normalized, and the coordinates are converted according to the window width and the window height of the page. Illustratively, the collected certain coordinate is (a, b), the window width of the page is screen width, the window height of the page is screen height, and the coordinate obtained by converting the coordinate is (a/screen width, b/screen height). The collected coordinates are converted according to the window width and the window height of the page, so that the influence of the window specification of the page on the collected coordinates can be reduced, and the accuracy of the collected coordinates can be improved.
In one embodiment, referring to fig. 12, step 403 further comprises:
step 1201, using the acquired staying position data of the cursor as first position data.
The execution program identifies and obtains the data of the staying position of the cursor in the page, which is equivalent to screening the data of the staying position from the moving data of the cursor as first position data, and the first position data comprises position data corresponding to all the staying positions of the cursor in the page, and can also be understood as that the first position data comprises all the staying points of the cursor in the content type typesetting page.
Step 1202, obtaining target position data according to the first position data.
The executing program further needs to further process the first position data in the page to obtain target position data, which may also be understood as extracting position data corresponding to all cursor staying points in the page to perform data processing to obtain target position data, where the target position data at least includes coordinates of the staying position of the cursor in the page.
In another embodiment, referring to fig. 13, step 403 further comprises:
step 1301, the acquired stop position data of the cursor is used as first position data.
The execution program identifies and obtains the data of the staying position of the cursor in the page, which is equivalent to screening the data of the staying position from the moving data of the cursor as first position data, and the first position data comprises position data corresponding to all the staying positions of the cursor in the page, and can also be understood as that the first position data comprises all the staying points of the cursor in the content type typesetting page.
Step 1302, removing the second position data from the first position data to obtain the target position data.
The execution program further needs to further process the first position data in the page to obtain target position data, wherein the second position data are the stop positions of the cursors which do not accord with the behavior habits of the user, and the coordinate distribution of the cursors in the target position data can better accord with the behavior habits of the user by removing the second position data from the first position data, so that the recognition capability of the user identification is improved.
In one embodiment, the second position data is at least one of: (1) the stop position data of the cursor corresponding to the content display area of the page; (2) the stopping position data of the cursor corresponding to the content display subarea in the content display area of the page; (3) the stopping position data of the cursor corresponding to the area where the interactive element of the page is located; (4) the dwell position data of the cursor corresponding to the interference area with the distribution density smaller than a third threshold value; (5) the dwell position data of the cursor with the longest dwell time; (6) and sorting according to the order of the stay time of the cursor on the page from long to short, and ranking the stay position data after a fourth threshold value.
For the data of the staying position of the cursor in the content display area of the page, referring to fig. 5, when the user browses the content type page, it is possible to stay the cursor in the content display area 501, for example, to select, copy, etc. the reading content of interest. At this time, the staying position of the cursor is guided by the content in the content display area, and in order to improve the accuracy of the user identification, the staying position data of the cursor corresponding to the content display area 501 of the surface is removed from the first position data as the second position data. Therein, a content area function of the current page may be constructed, for example satisfying the circular equation: (x-a) ^2+ (y-b) ^2 ^ r ^2. judge whether the dwell position of a cursor has guidance, can know through the equation that (x-a) ^2+ (y-b) ^2 ^ r ^2 is the dwell point that has guidance. Wherein, (a, b) is the center of the content area, and r is the radius of the content area. Similarly, a rectangular content area may also be constructed.
For the dwell position data of the cursor corresponding to a content display section in a content display area of a page, the content display area may include a plurality of content display sections, for example, as shown in fig. 14, a left bar 1401 and a right bar 1402 are included in the content display area 501, the left hand bar 1401 and the right hand bar 1402 both belong to a content display partition in the content display area 501, where the left hand bar 1401 is used to display the text content and the right hand bar 1402 is used to display the navigation links, first, the user may be influenced by the text content to stop the cursor in the left hand bar 1401, e.g. to use the cursor to perform the selecting and copying operations, in addition, the user may be influenced by the right bar 1402 to click on the navigation link, and therefore in this embodiment, the stay position data of the cursor corresponding to the content display section in the content display area 501 of the page is removed from the first position data as second position data. For example, in fig. 14, a function of the content areas of the left and right columns is constructed, and the stay position data of the cursor falling in the function is removed from the first position data as the second position data.
For the data of the staying position of the cursor corresponding to the area where the interactive element of the page is located, referring to fig. 15, the content display area 501 includes various interactive elements, such as a picture 1502, a button 1503, a form 1504, and the like, in addition to the text 1501, and these interactive elements have strong guidance, and in order to enable the target position data to reflect the behavior habit of the user browsing the content type typesetting page more accurately, the staying position data of the cursor falling into the function is removed from the first position data as the second position data by constructing the content area function of these interactive elements.
For the staying position data of the cursor corresponding to the interference area with the distribution density smaller than the third threshold, since the mouse does not move and stays in the non-content display area when the user browses the page, the position data of the cursor is subconscious habitual expression of the user in the browsing page, and habitual actions and behaviors can be reproduced and regulated in the same mode. The region with the lower distribution density therefore belongs to the disturbed dwell position data. Therefore, the dwell position of the cursor of the first position data is subjected to data processing, an area with the density smaller than a third threshold value is divided in the page, a content area function corresponding to the area is determined, and the dwell position data of the cursor falling into the function is removed from the first position data as second position data. For example, as shown in fig. 9, the track formed when the cursor moves is schematic, the cursor staying positions in the left area are distributed with a low density, and the staying positions are eliminated.
For the staying position data of the cursor with the longest staying time, a point with the longest staying time may be generated due to an error, for example, the staying time of the cursor is too long because the user leaves the terminal, and the position of the point cannot reflect the operation behavior habit of the user, so the staying position data of the cursor with the longest staying time is removed from the first position data as the second position data.
For the staying position data ranked after the fourth threshold value and sorted according to the sequence of the staying time of the cursor on the page from long to short, because the position data with longer staying time can better reflect the behavior habit of the user for operating the cursor for browsing the content type typesetting page, the staying position data ranked before the fourth threshold value can more accurately reflect the behavior habit of the user by sorting the staying time of the cursor from long to short, and therefore the staying position data ranked after the fourth threshold value is taken as the second position data to be removed from the first position data. The setting value of the fourth threshold may be set according to actual conditions, and the number of the fourth threshold in this embodiment is 200.
It should be noted that the second position data in the embodiment of the present application may be one or more combinations of the above dwell position data that can be removed, for example, the dwell position data of the cursor corresponding to the area where the interactive element of the page is located and the dwell position data of the cursor corresponding to the interference area whose distribution density is smaller than the third threshold may be simultaneously removed from the first position data as the second position data.
In one embodiment, referring to FIG. 16, step 404 further comprises:
step 1601, obtaining a centroid of the plurality of target position data according to the coordinates of the target position data.
The target position data can be distributed in a cluster shape, in a cluster formed by the target position data, the position with the maximum density is the center of mass, and the executive program can obtain the coordinate corresponding to the center of mass according to the obtained coordinate corresponding to the target position data.
In an embodiment, the centroid of the plurality of target location data may be obtained by a mean shift algorithm. Specifically, a reference position is randomly selected from a cluster formed by target position data, a reference circle is obtained by taking the reference position as the center of the circle, and a plurality of target position data fall into the reference circle; and then, with the reference position as a starting point and the position of the target position data falling in the reference circle as an end point, obtaining a plurality of reference vectors, adding the plurality of reference vectors to obtain a mean shift vector, and with the end point of the mean shift vector as a next reference position, iterating the process until the reference position when the size of the mean shift vector is converged is the centroid of the plurality of target position data.
In one embodiment, the number of clusters formed by the target position data may be multiple, each cluster may have a centroid, the obtained centroids may be multiple, and the centroid of the cluster with the higher density may be selected as the final centroid. Illustratively, in the process of iteratively obtaining the corresponding centroid for each cluster, the sum of the numbers of reference vectors involved in obtaining the mean shift vector may be recorded, and on the premise that the reference position and the reference circle of each cluster are the same in selection criterion, the larger the sum of the numbers of reference vectors is, the larger the density of the cluster is proved to be, and finally, the centroid of the cluster with the larger sum of the numbers of reference vectors is taken as the final centroid.
In an embodiment, in the process of obtaining the mean shift vector by using the reference vector, a weight coefficient may be added to the reference vector to improve the accuracy and rationality of the finally obtained centroid. For example, the dwell time of the target position data may be used as a standard of the weight coefficient, and the longer the dwell time of the target position data, the greater the weight of the reference vector when obtaining the mean shift vector.
Step 1602, generate a user identifier for user identification according to the coordinates of the centroid.
The user identifier is generated according to the coordinates of the centroid, and may be calculated by using the coordinates of the centroid through a certain algorithm, for example, the user identifier may be calculated by using a hash algorithm, and the calculated user identifier may be one or a combination of a plurality of numbers and characters. In an embodiment, the coordinates of the centroid can also be used directly as the user identification. By obtaining the mass centers of the target stop points and generating the user identification according to the coordinates of the mass centers, the finally generated user identification can reflect the behavior habits of the corresponding users more reasonably and accurately.
For example, in order to intuitively explain the method of calculating the centroid, the following description is made in conjunction with fig. 17, and it should be noted that the points in fig. 17 represent coordinates in the target position data, and are not actually displayed in the page.
Referring to fig. 17, point X1 is a reference position, point R1 is a reference circle, point X1 is a starting point, point where target position data falling within reference circle R1 is located is an end point, a plurality of reference vectors B1 are obtained, a mean shift vector M1 is obtained by adding a plurality of reference vectors B1, a reference circle R2 is obtained by using an end point of mean shift vector M1 as next reference position X2, and the above process is repeated until the size of nth mean shift vector Mn converges, at which time the end point of Mn (i.e., next reference position M) is present (i.e., next reference position M2 is present)n+1) Namely the centroid. For example, on this basis, when M1 is obtained by using B1, a corresponding weight coefficient may be given to B1 according to the stay time length of the target position data corresponding to B1, and the longer the stay time length of the target position data is, the larger the weight coefficient of the corresponding B1 is. For example, the respective reference vectors B1 are divided into three groups according to the order of the stay periods, and the weighting coefficients when corresponding B1 are added may take 1, 2, or 3.
In one embodiment, referring to fig. 18, on the basis of steps 401 to 404, the method further includes the following steps:
step 1801, sending the verification information containing the user identifier to a server.
When the execution program acquires enough stay position data samples, the user identification corresponding to the current sample can be obtained through calculation, the execution program sends verification information to the server, the verification information comprises the user identification generated in any one of the above embodiments, and the verification information is sent to the server to verify whether the current user identification is matched with a specific user or not and serves as a basis for whether the stay position data of the cursor in the page is continuously acquired or not.
Step 1802, obtaining verification feedback information from the server, where the verification feedback information is used to feed back whether the user identifier has a matching user in the server.
And the execution program acquires verification feedback information from the server, wherein the verification feedback information is used for feeding back whether the user identification has a matched user in the server. In comparison, the server compares the acquired user identifier in a user identifier library of the server, in an embodiment, the user identifier is a coordinate of a centroid of the target position data, or the user identifier is reduced to the coordinate of the centroid by using an algorithm (e.g., a hash algorithm) for generating the user identifier, when matching comparison is performed, certain fuzzy processing is performed on the user identifier, for example, the coordinate of the centroid is (Xp, Yp), an offset is added to the coordinate of the centroid, for example, (Xp ± 0.03, Yp ± 0.03), at this time, the user identifier is converted into a range, and then coordinate data falling into the range is searched in the user identifier library, so as to identify the current user.
When the matched user exists, the current user identification is valid, and when the matched user does not exist, the current user identification is invalid, the current user is probably a user which is not recognized before, or the acquired stay position data sample is not enough to reflect the behavior habit of the user.
Based on this, referring to fig. 19 in one embodiment, on the basis of steps 1801 to 1802, at least one of the following steps is further included:
step 1901, when the verification feedback information indicates that there is a matching user, sending the user identifier to the server.
And if the verification feedback information indicates that the matched user exists, the current user identifier is effective, and the user identifier is sent to a server, wherein the verified server and the server sending the user identifier can be different servers, for example, the server comprises a verification server and a target server, the target server is a developer of the current execution program, the verification server is a user identification resource library shared by different developers, the execution programs on different terminals are verified by the verification server, and the generated user identifier is sent to the target server after the verification is successful. In another embodiment, the verification server and the target server are the same server, and at this time, when the verification feedback information indicates that there is a matching user, the execution program stops collecting the data of the staying position of the cursor, and the user identifier does not need to be sent again.
Step 1902, when the verification feedback information indicates that there is no matching user, continue to acquire the staying position data and regenerate the user identifier until the user is successfully matched or the preset matching condition is exceeded.
When the verification feedback information indicates that there is no matching user, possibly because the collected target position data does not reflect the behavior habit of the user, the stop position data of the cursor is collected continuously and the user identifier is regenerated. Another way is to combine the already collected sample data of the stopping position to collect the stopping position data of the new cursor in a manner of adding sample, for example, in an embodiment, as shown in fig. 20, the step 1902 includes the following steps:
step 2001, the existing acquired dwell position data is retained as the first dwell position data set.
The execution program saves the currently collected cursor dwell position data as a first dwell position data set, which may be the retrieval of already collected target position data.
Step 2002, continue to acquire new dwell position data to generate a second dwell position data set.
The execution program executes steps 101 to 103 of the above-described embodiment again, and generates a second stopping position data set including the newly acquired stopping position data of the cursor.
Step 2003, regenerating the user identifier according to the first position staying data set and the second position staying data set.
The execution program generates new target stopping position data according to the first stopping position data set and the second stopping position data set, wherein the screening step of step 1302 of the above embodiment may be executed again to obtain new target stopping position data, a new user identifier is generated again, and the verification information containing the new user identifier is sent to the server again, if the received verification feedback information still indicates that there is no matching user, steps 2001 to 2003 are repeatedly executed until the user is successfully matched or the preset matching condition is exceeded.
In one embodiment, the preset matching condition includes at least one of:
the first preset matching condition is as follows: and presetting the matching times, wherein after verification feedback information from the server is received each time, one matching is indicated to be completed, and if the preset matching times are exceeded, the continuous acquisition of the staying position data of the cursor in the page is stopped. In this embodiment, the matching times are set to 3 times, that is, when the verification feedback information received from the server reaches 3 times and still cannot be matched, the continuous acquisition of the data of the staying position of the cursor in the page is stopped, it should be noted that the preset matching times may be adjusted as needed.
The second preset matching condition is as follows: the preset matching time, that is, the execution time of the user identifier generation method, is equivalent to the execution time of the execution program, and if the execution time of the execution program exceeds the preset matching time and still fails to match the user, the stop of continuously acquiring the data of the staying position of the cursor in the page is stopped, the preset matching time set in this embodiment is 15 minutes, that is, when the execution time of the user identifier generation method exceeds 15 minutes and still fails to match the user, the stop of acquiring the data of the staying position of the cursor in the page is stopped.
The first preset matching condition and the second preset matching condition may be used according to a selection of one of the preset matching conditions, for example, only the first preset matching condition or the second preset matching condition is used to determine whether to stop obtaining the data of the staying position of the cursor in the page.
In addition, two preset matching conditions can be set simultaneously, for example, as long as the first preset matching condition or the second preset matching condition is met, the acquisition of the data of the staying position of the cursor in the page is stopped. Or stopping acquiring the data of the staying position of the cursor in the page only when the first preset matching condition or the second preset matching condition is simultaneously met.
By setting the preset matching conditions, excessive consumption of terminal computing resources can be avoided.
In an embodiment, when a preset matching condition is exceeded, the currently generated user identifier is recorded in a user identifier library of the server as a new user identifier corresponding to the current user. When the current user browses the content type typesetting page at any browsing program of any terminal next time, a user identifier corresponding to the user can be generated, so that the server can identify the user through the user identifier.
In any of the embodiments described above, the execution program may be a function module built in the browsing program, for example, a built-in software function module of desktop software such as a browser, an instant chat software, and a document processing software, that is, the browsing program already includes the execution program when installed, and when the user uses the browsing program, the execution program executes the user identifier generating method according to any of the embodiments described above.
The executive may also be a browser plug-in, such as a browser plug-in that the user actively installs in the browser or a plug-in that the user automatically installs to the browser when browsing web pages. In the former case, the browser plug-in is published by the browser author, for example, in a plug-in library website for the user to download and install. The user identification can be identified by the webpage server through the user identification, even if the user replaces different browsing programs or even different terminals, the user can be tracked and identified only by logging in a specific webpage of the webpage server, and the user identification of the cross-browser is realized.
The executive may also be a separate software from the viewer, such as an application running in the background of the operating system, e.g., security management software, input method software, etc. At this time, the execution program runs in the background, and after the user opens the browsing window, the execution program in the background executes the user identifier generation method according to any one of the embodiments. Even if the user changes the browsing program or the terminal, the user can be tracked and identified as long as the terminal is provided with the execution program, and the user can be identified by crossing browsers.
The user identifier generation method provided by the embodiment of the present application may be applied to an application environment shown in fig. 3, where the application environment includes: the system comprises a terminal 11, a server 12 and a communication network 13, wherein the terminal 11 and the server 12 are connected with each other in a communication mode through the communication network 13. The servers 12 may be in the form of a single server or a cluster, that is, one or more servers, and the servers 12 may be set by a web page provider, a browser provider, or an advertisement provider, for example.
Fig. 21 is a flowchart of a user identification method, which is provided in an exemplary embodiment of the present application and is applied to a server, and the method specifically includes step 2101, step 2102, step 2103, and step 2104.
2101, obtaining a user identifier, wherein the user identifier is generated by the user identifier generation method according to any one of the embodiments;
step 2102, matching the user identifier in a preset user identifier feature library; and the number of the first and second groups,
further comprising the step of at least one of:
step 2103, when the matching is successful, the user is identified;
step 2104, when the matching fails, storing the user identifier as a new user identifier feature in the user identifier feature library.
In step 2101, a server obtains a user identifier from a terminal through a communication network, where the user identifier is generated based on data of a staying position of a cursor in a page, where the data is collected by a user browsing a content-based page, and the data of the staying position is position data of the cursor staying in the page for a time period longer than a first threshold. The first threshold is set as required, in this embodiment, the first threshold is 1 second, that is, the dwell time of the cursor at a certain coordinate is longer than 1 second, and the position data corresponding to the coordinate is the dwell position data of the cursor. And the terminal obtains target position data according to the data of the staying position of the cursor in the page, and generates a user identifier according to the target position data. The coordinates corresponding to the target position data reflect the behavior habit of the user for browsing the content type typesetting webpage, so the user identification can be effectively identified and the end user can be identified.
The user identification is coordinates of a center of mass of a cursor distribution position in the content type typesetting page, or a character string generated by the coordinates of the center of mass according to a certain algorithm (such as a Hash algorithm). In this embodiment, the centroid of the plurality of target position data is obtained by adopting a mean shift algorithm for the distribution positions of the cursor in the content type typesetting page, the coordinates of the centroid may be expressed as (Xp, Yp), and the coordinates of the centroid are used as the user identifier, or the user identifier is generated by using the coordinates of the centroid. In one embodiment, the coordinates of the centroid are identified by absolute values, that is, Xp is a width value of the centroid in the page window, Yp is a height value of the centroid in the page window, and different sizes of different device page windows are used, so in another embodiment, the coordinates of the centroid are normalized, and the coordinates are converted according to the window width and the window height of the page. Illustratively, one collected coordinate is (Xa, Ya), the window width of the page is screen width, the window height of the page is screen height, and the coordinate is converted, that is, Xp is Xa/screen width, Yp is Ya/screen height, and the corresponding user identification value is (Xa/screen width, Ya/screen height). The coordinates of the centroid are converted according to the window width and the window height of the page, so that the influence of the window specification of the page on the coordinates of the centroid can be reduced, and the identification accuracy is improved.
In step 2102, the server compares the obtained user identifier in a user identifier library of the server, in an embodiment, the user identifier is a coordinate of a centroid of the target location data, and when performing matching comparison, an offset is added based on the centroid coordinate to obtain a range value of a coordinate, for example, when the offset is 0.03, the range value is (Xp ± 0.03, Yp ± 0.03), at this time, a coordinate corresponding to the user identifier stored in the user identifier library is compared with the range value, and the coordinate falling in the range value is the matched user identifier.
And if the matching is successful, completing the identification of the user, and if the matching is failed, updating the user identifier which is not successfully matched into a user identifier library to complete the new establishment of the user identifier.
According to the user identification method provided by the embodiment of the application, the user identification is obtained, the user identification is target position data is obtained according to the stopping position data of the cursor in the page, and then the user identification is generated through the target position data. Because the coordinate corresponding to the target position data reflects the behavior habit of the user for browsing the content type typesetting webpage, the acquired user identification can be directly associated with the behavior habit of the user and does not depend on the static information of the browser, even if the user changes the browsing program or changes the computer equipment, the user identifications generated by different browsing programs or even different computer equipment are consistent or related because the operation habits of the user for browsing the content type typesetting webpage are consistent, and therefore, the accuracy of user identification can be improved through the user identifications.
Fig. 22 is a flowchart of a user identifier generating method according to another exemplary embodiment of the present application, including steps 2201 to 2211, where the user identifier generating method according to this embodiment is applied to a terminal, and includes the following specific steps:
step 2201, judging whether the current access address of the page is matched with a preset page address or not, and if so, executing step 2203; if not, go to step 2202.
And running a browsing program to access the page, and comparing and matching the current access address of the page with a preset page address in a list library by acquiring the current access address of the page, wherein the preset address in the list library is a known content type typesetting page, and the preset page address in the list library can be acquired from a server or stored locally in the terminal. For example, if a web page is accessed through a browser, or if a web page is accessed through a search or hyperlink, the web address of the web page is obtained and matched.
Step 2202, identifying a content display area in a page, judging whether the page is a content type typesetting page, if so, executing step 2203; if not, ending.
And identifying whether a content display area in the page is positioned in the middle of the page, and a blank area with the width accounting for the page width ratio and larger than a second threshold exists between two sides of the content display area in the width direction and the edge of the page, so that the page is a content type typesetting page. In this embodiment, the value of the second threshold is 20% to 30%.
Step 2203, coordinates of the cursor in the page are collected.
Step 2204, obtaining the staying position data of the cursor in the page as the first position data.
If the staying time of the cursor at the coordinate is longer than a first threshold, the position data corresponding to the coordinate is used as the staying position data of the cursor, a value of the first threshold may be determined according to an actual requirement, for example, the first threshold may be 1 second, that is, the staying time of the cursor at a certain coordinate is longer than 1 second, and the position data corresponding to the coordinate is the staying position data of the cursor. And if the time difference between two adjacent coordinates is greater than a first threshold value, position data corresponding to the prior coordinate in the two adjacent coordinates is used as the staying position data of the cursor.
Step 2205, the second position data is removed from the first position data, and then the target position data is obtained.
Wherein the second location data comprises at least one of:
the stop position data of the cursor corresponding to the content display area of the page;
the stopping position data of the cursor corresponding to the content display subarea in the content display area of the page;
the dwell position data of the cursor corresponding to the interference area with the distribution density smaller than a third threshold value;
the dwell position data of the cursor with the longest dwell time;
and sorting according to the order of the dwell time of the cursor on the page from long to short, and ranking the dwell position data after a fourth threshold, wherein the fourth threshold in the embodiment is 200.
Step 2206, judging whether the sample collected in the target position data reaches a fifth threshold, if so, executing step 2207, and if not, returning to step 2201.
The sample collected in the target position data is the data of the staying position of the cursor that meets the screening rule in step 2205, wherein the fifth threshold may be set as needed, in this embodiment, the fifth threshold is a dynamic change value, and the initial number is 100.
Step 2207, obtaining the centroids of the plurality of target position data according to the target position data.
Since the target position data is equivalent to the distribution of cursor dwell positions on the page, the centroid of these coordinate positions can be found by mean shift algorithm.
Step 2208, generating a user identifier for user identification according to the coordinates of the centroid.
The user identifier is generated from the coordinates of the centroid, or the coordinates of the centroid are directly used as the user identifier, the coordinates of the centroid may be represented as (Xp, Yp), in this embodiment, the coordinate values are relative proportion positions of the centroid in a page window, exemplarily, the coordinates of a certain collected centroid are (Xa, Ya), the window width of the page is screen width, the window height of the page is screen height, the coordinates are converted, that is, Xp is Xa/screen width, Yp is Ya/screen height, and the corresponding user identifier value is (Xa/screen width, Ya/screen height).
Step 2209, the user identifier is sent to the server for verification, if the verification is successful, the verification is finished, and if the verification fails, step 2210 is executed.
The server matches and compares the acquired user identification with a user identification library in the server, if the matching is successful, the user identification is successful, and the server feeds back verification feedback information to the terminal to inform the terminal of the identification result.
Step 2210, determining whether the preset matching condition is exceeded, if yes, executing step 2211, if not, increasing the value of the fifth threshold, and then returning to step 2201.
And 2211, marking the currently generated user identifier as a new user identifier and sending the new user identifier to the server.
The preset matching conditions include a preset matching frequency and a preset matching time, and the preset matching time is determined to exceed the preset matching conditions as long as any one matching condition is exceeded, the preset matching frequency in this embodiment is 3 times, and the preset matching time is 15 minutes. If the preset matching condition is not exceeded, the number of samples collected in step 2206 is increased by adding the data of the fifth threshold, so as to improve the accuracy of the next generated user identifier. And the terminal sends the user mark marked as the new user mark to the server, and the server adds the user mark into a user mark library. In an embodiment, the user identifier with the largest occurrence frequency when the user stops browsing the page (for example, closes the browser) may be used to correspond to the user browsing the page at the time, or the user identifier generated last time when the user stops browsing the page may be used to correspond to the user browsing the page at the time.
In summary, in the user identifier generating method provided in this embodiment, whether a currently displayed page is a content type typesetting page is determined, when the page is the content type typesetting page, the staying position data of a cursor in the page is obtained and the target position data is obtained, and the user identifier is generated according to the coordinates obtained by the target position data, because the typesetting design of the content type typesetting page aims at displaying the text content, and the user has a specific behavior habit when browsing the type of typesetting page, the target position data can accurately reflect the behavior habit of the user when browsing the web page, the user identifier generated by the target position data can be directly associated with the behavior habit of the user without depending on the static information of the browser, even if the user changes a browsing program or changes computer equipment, because the operation habits of the user for browsing the content type typesetting page are consistent, therefore, the user identifications generated by different browsing programs and even different computer devices are consistent or related, the relevance between the user identification generated by the embodiment of the application and the user is strong, and the accuracy of user identification can be improved by using the user identification.
In the above embodiment, the user identifier is generated by the terminal, and in an embodiment, the user identifier may also be generated by the server, for example, the terminal determines whether the currently displayed page is a content-type typesetting page, obtains data of a position where the cursor stays in the page when the page is the content-type typesetting page, obtains target position data according to the data of the position where the cursor stays in the page, then the terminal sends the target position data to the server, and the server generates the user identifier for identifying the user according to coordinates obtained by the target position data. For another example, in another embodiment, the terminal is responsible for determining whether the currently displayed page is a content-type typesetting page and acquiring the stay position data of the cursor in the page, and then sending the stay position data to the client, and the client performs data filtering processing to obtain the target position data, thereby generating the user identifier for identifying the user. The detailed steps of the above steps are only different in execution subject, and the specific execution details are consistent.
As an example, for explaining the user identifier generating method and the user identifying method described above in the embodiment of the present application, the following description is given for different application scene developments:
the first scenario is applied to a scenario in which a user uses a browser, and an execution program for executing the user identifier generation method is built in the browser, and the specific process is as follows:
firstly, in the process of browsing a webpage by using a browser, the browser judges whether the webpage browsed by the user currently is a content type typesetting page or not, if the webpage browsed by the user currently is the content type typesetting page, stay position data generated in the process of cursor movement are acquired, then target position data are screened from the stay position data, the mass center of the target position data is calculated, and finally, the browser generates a user identifier corresponding to the user according to the mass center and sends the user identifier to a server. The server identifies the user by using the user identification, namely searching whether the same user identification exists in a user identification library; and if the identification is unsuccessful, the user identification corresponding to the user identification is a new user, and the corresponding user identification is stored in a user identification library for subsequent identification of the user.
And a second scenario, which is different from the previous scenario, applied to a scenario in which a user uses a browser, where the browser is a common browser and does not have an execution program for executing the user identifier generation method in the foregoing embodiment, and the specific process is as follows:
in browsing a web page using a browser, a user accesses a web page provided by a web server and, when accessing the web page, the webpage prompts installation of a browser plug-in, if the current browser setting is the automatic installation plug-in, the plug-in is automatically installed, if the browser setting is blocked, prompting the user to click and install, after the browser plug-in is installed, judging whether the webpage currently browsed by the user is a content type typesetting page or not by the browser plug-in, if the webpage currently browsed by the user is the content type typesetting page, acquiring staying position data generated in the cursor moving process by the browser plug-in, then, the target position data are screened out from the stay position data, the mass center of the target position data is calculated, and finally, and the browser plug-in generates a user identifier corresponding to the user according to the centroid, and sends the user identifier to the server. The server identifies the user by using the user identification, namely searching whether the same user identification exists in a user identification library; and if the identification is unsuccessful, the user identification corresponding to the user identification is a new user, and the corresponding user identification is stored in a user identification library for subsequent identification of the user.
Scenario three, apply to software that the user uses a non-traditional browser, illustratively, referring to fig. 23, the user browses a web page using a browsing window embedded in the chat software. At this time, the chat software can judge whether the webpage currently browsed by the user is a content type typesetting page, if the webpage currently browsed by the user is the content type typesetting page, the stay position data generated in the cursor moving process is obtained, then the target position data is screened out from the stay position data, the mass center of the target position data is calculated, finally, the chat software generates the user identification corresponding to the user according to the mass center, and the user identification is sent to the server. The server identifies the user by using the user identification, namely searching whether the same user identification exists in a user identification library; and if the identification is unsuccessful, the user identification corresponding to the user identification is a new user, and the corresponding user identification is stored in a user identification library for subsequent identification of the user.
And fourthly, in the process of browsing the document by the user by using the document processing software, judging whether the document page browsed by the user currently is a content type typesetting page or not by the document processing software, if the document page browsed by the user currently is the content type typesetting page, acquiring stay position data generated in the process of cursor movement, screening target position data from the stay position data, calculating the mass center of the target position data, and finally, generating a user identifier corresponding to the user by the document processing software according to the mass center and sending the user identifier to the server. The server identifies the user by using the user identification, namely searching whether the same user identification exists in a user identification library; and if the identification is unsuccessful, the user identification corresponding to the user identification is a new user, and the corresponding user identification is stored in a user identification library for subsequent identification of the user.
In the first to fourth scenarios, the actions of determining whether the page currently browsed by the user is a content type typesetting page, acquiring the stay position data generated in the cursor moving process, screening the target position data from the stay position data and calculating the centroid, generating the user identifier corresponding to the user according to the centroid, and sending the user identifier to the server may also be executed by software running in the background of the operating system, such as security software, chat software, an input method, and the like. For example, for scene one to scene three, the data of the browser or the browsing window may be acquired by software running in the background of the operating system, and for scene four, the data of the document processing software may be acquired by software running in the background of the operating system.
In the above scenario, the user identifier may be generated by the terminal or the server. For example, in scene one, after the browser screens out the target position data from the stay position data, the browser sends the target position data to the server, and the server performs the calculation of the centroid and the generation of the user identifier, or the browser sends the centroid to the server, and the server performs the generation of the user identifier; in a second scenario, after the browser plug-in screens out target position data from the staying position data, the target position data are sent to a server, the server performs calculation of the centroid and generation of the user identifier, or the browser plug-in sends the centroid to the server, and the server performs generation of the user identifier; in a third scenario, after the third-party software screens out the target position data from the staying position data, the target position data is sent to a server, the server performs calculation of the centroid and generation of the user identifier, or the third-party software sends the centroid to the server, and the server performs generation of the user identifier; in the fourth scenario, after screening out the target position data from the staying position data, the document processing software sends the target position data to the server, and the server performs the calculation of the centroid and the generation of the user identifier, or the document processing software sends the centroid to the server and the server performs the generation of the user identifier; in a fifth scenario, after the target position data is screened from the staying position data by the software running independently in the background, the target position data is sent to the server, and the server performs calculation of the centroid and generation of the user identifier, or the software running independently in the background sends the centroid to the server, and the server performs generation of the user identifier.
Fig. 24 is a block diagram illustrating a structure of a user identifier generating apparatus according to an embodiment of the present application, where the apparatus includes:
a determining module 2401, configured to determine whether a currently displayed page is a content type typesetting page, where a content display area of the content type typesetting page is located in the middle of the content type typesetting page along a width direction, and a blank area exists between two sides of the content display area along the width direction and an edge of the content type typesetting page;
a position data obtaining module 2402, configured to obtain first position data of a cursor in a content type typesetting page when the page is the content type typesetting page, where the first position data is position data of a dwell time of the cursor in the content type typesetting page longer than a first threshold;
a target position data generating module 2403, configured to obtain target position data according to the data of the position where the cursor stays in the page;
an identifier generating module 2404, configured to generate a user identifier for identifying a user according to the coordinates obtained by the target location data.
The utility model provides a subscriber identity generates device utilizes judgment module 2401 to judge whether the page that shows at present is content type typesetting page, works as the page is content type typesetting page, and position data acquisition module 2402 acquires the cursor and is in stay position data in the page, and target position data generation module 2403 obtains target position data, and sign generation module 2404 is according to the coordinate generation subscriber identity that target position data obtained, because the typesetting design of content type typesetting page is with showing text content as the purpose, the user has specific behavior custom when browsing this type of typesetting page, therefore target position data can accurately reflect the behavior custom of user when browsing the webpage, and the subscriber identity through target position data generation can directly be associated with the behavior custom of user to do not rely on the static information of browser, even if the user changes the browsing program or changes the computer equipment, the operation habits of the user for browsing the content type typesetting page are consistent, so that the user identifications generated by different browsing programs and even different computer equipment are consistent or related, the relevance between the user identification generated by the embodiment of the application and the user is strong, and the accuracy of user identification can be improved by using the user identification.
Fig. 25 is a block diagram illustrating a structure of a user identification device according to an embodiment of the present application, where the user identification device includes:
a user identifier receiving module 2501, configured to obtain a user identifier generated by the user identifier generating apparatus in the foregoing embodiment;
a matching module 2502, configured to match the user identifier in a preset user identifier feature library;
a confirmation module 2503, configured to complete the identification of the end user after the matching is successful;
and the new creation module 2504 is configured to, after the matching fails, save the user identifier as a new user identifier feature in the user identifier feature library.
The user identification device provided by the embodiment of the application obtains the user identification through the user identification receiving module 2504, the user identification obtains target position data according to the stop position data of the cursor in the page, and then the user identification is generated through the target position data. Because the coordinate corresponding to the target position data reflects the behavior habit of the user for browsing the content type typesetting webpage, the acquired user identification can be directly associated with the behavior habit of the user and does not depend on the static information of the browser, even if the user changes the browsing program or changes the computer equipment, the user identifications generated by different browsing programs or even different computer equipment are consistent or related because the operation habits of the user for browsing the content type typesetting webpage are consistent, and therefore, the accuracy of user identification can be improved by utilizing the user identifications.
Fig. 26 is a block diagram illustrating a structure of a user identifier generating apparatus according to an embodiment of the present application, where the apparatus includes: at least one processor 2601, at least one memory 2602, and at least one program stored on the memory 2602 and executable on the processor 2601 to implement the user identification generation method in the above embodiments.
The processor 2601 and the memory 2602 may be connected by a bus or other means, such as the bus in fig. 26.
Fig. 27 is a block diagram illustrating a structure of a user identification device according to an embodiment of the present application, where the user identification device includes: at least one processor 2701, at least one memory 2702, and at least one program stored on the memory 2702 and executable on the processor 2701 to implement the user identification method in the above-described embodiments.
The processor 2701 and the memory 2702 may be connected by a bus or other means, and the bus connection is exemplified in fig. 27.
Referring to fig. 28, an embodiment of the present application further provides a terminal, where the terminal includes the user identifier generating apparatus in the foregoing embodiment.
Referring to fig. 29, an embodiment of the present application further provides a server, where the server includes the user identification device in the foregoing embodiment.
An embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, which are executed by a processor or controller, for example, by a processor 2601 in fig. 26, and enable the processor 2601 to execute the user identifier generation method in the embodiment of the present application, for example, to execute the method steps 401 to 404 in fig. 4, 601 to 603 in fig. 6, 701 to 702 in fig. 7, 801 to 804 in fig. 8, 1001 to 1002 in fig. 10, 1101 to 1102 in fig. 11, 1201 to 1202 in fig. 12, 1301 to 1302 in fig. 13, 1601 to 1602 in fig. 16, 1801 to 1802 in fig. 18, 1901 to 1902 in fig. 19, 2001 to 2003 to 1802 in fig. 20, and the method steps 401 to 404 in fig. 4, 601 to 603 in fig. 6, 701 to 702 in fig. 7, 801 to 804 in fig. 8, 1001 to 1002, 1101 in fig. 11, and/or the method steps 1901 to 1902 in fig. 20, Method steps 2201 to 2211 in fig. 22.
For another example, when executed by one of the processors 2701 in fig. 27, may cause the processor 2701 to perform the user identification method in the embodiment of the present invention, for example, to perform the method steps 2101 to 2104 in fig. 21 described above.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art. While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are included in the scope of the present invention defined by the claims.

Claims (15)

1. A user identification generation method comprises the following steps:
judging whether the currently displayed page is a content type typesetting page or not;
when the page is a content type typesetting page, acquiring staying position data of a cursor in the page, wherein the staying position data is position data of the cursor staying in the page for a time period longer than a first threshold value;
obtaining target position data according to the staying position data of the cursor in the page;
and generating a user identifier for identifying the user according to the coordinates obtained by the target position data.
2. The method of claim 1, wherein the step of determining whether the currently displayed page is a content-type typeset page comprises one of the following steps:
judging whether the access address of the page is matched with a preset page address or not, and if so, determining that the page is a content type typesetting page;
identifying a content display area in the page, wherein if the content display area is located in the middle of the page, and a blank area with the width occupying the page width ratio larger than a second threshold exists between two sides of the content display area along the width direction and the edge of the page, the page is a content type typesetting page;
judging whether the access address of the page is matched with a preset page address or not, and if so, determining that the page is a content type typesetting page; and if not, identifying a content display area in the page, and if the content display area is positioned in the middle of the page, and a margin area with the width accounting for the page width ratio and larger than a second threshold exists between the two sides of the content display area in the width direction and the edge of the page, then the page is a content type typesetting page.
3. The method of claim 1, wherein obtaining target location data from the dwell location data of the cursor on the page comprises:
taking the acquired staying position data of the cursor as first position data;
and after second position data are removed from the first position data, obtaining target position data, wherein the second position data comprise at least one of the following:
the dwell position data of the cursor corresponding in a content display area of the page;
the dwell position data of the cursor corresponding to a content display zone in a content display area of the page;
the stopping position data of the cursor corresponding to the area where the interactive element of the page is located;
the dwell position data of the cursor corresponding to the interference area with the distribution density smaller than a third threshold value;
the dwell position data of the cursor with the longest dwell time;
and sorting according to the order of the stay time of the cursor on the page from long to short, and ranking the stay position data after a fourth threshold value.
4. The method of claim 1, wherein the obtaining of the data of the position of the cursor on the page comprises:
collecting coordinates of the cursor in the page;
and if the stay time of the cursor at the coordinate is longer than a first threshold value, taking the position data corresponding to the coordinate as the stay position data of the cursor.
5. The method of claim 1, wherein the obtaining of the data of the position of the cursor on the page comprises:
responding to the movement of a cursor, and acquiring a plurality of continuous coordinates and corresponding time points of the cursor in the moving process of the cursor in the page;
and if the time difference between two adjacent coordinates is greater than the first threshold, using the position data corresponding to the previous coordinate in the two adjacent coordinates as the staying position data of the cursor.
6. The method of claim 5, wherein said obtaining a plurality of consecutive coordinates and corresponding times during which the cursor moves within the page comprises:
and acquiring a plurality of continuous coordinates and corresponding time points of the cursor in the moving process in the page according to a preset acquisition rate.
7. The method of claim 1, wherein the coordinates are proportional to a position of the cursor relative to a window size of the page.
8. The method of any one of claims 1 to 7, wherein generating a user identifier for identifying a user from the coordinates obtained from the target location data comprises:
obtaining the centroids of the plurality of target position data according to the coordinates of the target position data;
and generating a user identifier for user identification according to the coordinates of the centroid.
9. The method of claim 8, wherein the centroids of the plurality of target location data are obtained by a mean shift algorithm.
10. The method of any one of claims 1 to 9, wherein the page is displayed in a browser.
11. The method of any one of claims 1 to 9, further comprising:
sending verification information containing the user identification to a server;
and obtaining verification feedback information from the server, wherein the verification feedback information is used for feeding back whether the user identification has a matched user in the server.
12. The method of claim 11, further comprising at least one of:
when the verification feedback information indicates that a matched user exists, sending the user identification to the server;
and when the verification feedback information shows that no matched user exists, continuously acquiring the staying position data and regenerating the user identifier until the user is successfully matched or the preset matching condition is exceeded.
13. The method of claim 12, wherein continuing to collect dwell location data and regenerate a user identification comprises:
the existing acquired stopping position data is reserved as a first stopping position data set;
continuously acquiring new stopping position data to generate a second stopping position data set;
and regenerating the user identification according to the first position staying data set and the second position staying data set.
14. The method according to claim 12 or 13, wherein the preset matching condition comprises at least one of:
presetting matching times;
and presetting matching time.
15. A user identification method, comprising:
obtaining a user identification generated by the method of any one of claims 1 to 14;
matching the user identification in a preset user identification feature library; and the number of the first and second groups,
further comprising the step of at least one of:
when the matching is successful, the user is identified;
and when the matching fails, the user identification is used as a new user identification characteristic and is stored in the user identification characteristic library.
CN202010190953.4A 2020-03-18 2020-03-18 User identification generation method, user identification method and device Active CN111400575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010190953.4A CN111400575B (en) 2020-03-18 2020-03-18 User identification generation method, user identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010190953.4A CN111400575B (en) 2020-03-18 2020-03-18 User identification generation method, user identification method and device

Publications (2)

Publication Number Publication Date
CN111400575A true CN111400575A (en) 2020-07-10
CN111400575B CN111400575B (en) 2023-06-23

Family

ID=71428919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010190953.4A Active CN111400575B (en) 2020-03-18 2020-03-18 User identification generation method, user identification method and device

Country Status (1)

Country Link
CN (1) CN111400575B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112697113A (en) * 2020-12-10 2021-04-23 四川长虹电器股份有限公司 Method for displaying disaster data change situation of mass sensors
CN114244826A (en) * 2022-01-18 2022-03-25 杭州盈高科技有限公司 Webpage identification information sharing method and device, storage medium and processor
CN116957680A (en) * 2023-08-03 2023-10-27 深圳花旦传媒有限公司 Advertisement putting effect monitoring system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446979A (en) * 2008-12-26 2009-06-03 北京科尔威视网络科技有限公司 Method for dynamic hotspot tracking
CN101833619A (en) * 2010-04-29 2010-09-15 西安交通大学 Method for judging identity based on keyboard-mouse crossed certification
US20130246383A1 (en) * 2012-03-18 2013-09-19 Microsoft Corporation Cursor Activity Evaluation For Search Result Enhancement
CN103944722A (en) * 2014-04-17 2014-07-23 华北科技学院 Identification method for user trusted behaviors under internet environment
US8914496B1 (en) * 2011-09-12 2014-12-16 Amazon Technologies, Inc. Tracking user behavior relative to a network page
CN105760516A (en) * 2016-02-25 2016-07-13 广州视源电子科技股份有限公司 Method and device for distinguishing users
CN110188275A (en) * 2019-05-30 2019-08-30 广州虎牙信息科技有限公司 A kind of browsing monitoring method, device, equipment and the storage medium of web page element

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446979A (en) * 2008-12-26 2009-06-03 北京科尔威视网络科技有限公司 Method for dynamic hotspot tracking
CN101833619A (en) * 2010-04-29 2010-09-15 西安交通大学 Method for judging identity based on keyboard-mouse crossed certification
US8914496B1 (en) * 2011-09-12 2014-12-16 Amazon Technologies, Inc. Tracking user behavior relative to a network page
US20130246383A1 (en) * 2012-03-18 2013-09-19 Microsoft Corporation Cursor Activity Evaluation For Search Result Enhancement
CN103944722A (en) * 2014-04-17 2014-07-23 华北科技学院 Identification method for user trusted behaviors under internet environment
CN105760516A (en) * 2016-02-25 2016-07-13 广州视源电子科技股份有限公司 Method and device for distinguishing users
CN110188275A (en) * 2019-05-30 2019-08-30 广州虎牙信息科技有限公司 A kind of browsing monitoring method, device, equipment and the storage medium of web page element

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
房超: "基于鼠标动力学模型的用户身份认证与监控" *
房超;蔡忠闽;沈超;牛非;管晓宏;: "基于鼠标动力学模型的用户身份认证与监控", 西安交通大学学报, vol. 42, no. 10 *
李荣华: "基于LPVC和行为特征的身份认证技术研究与实现", no. 2 *
沈超: "基于鼠标行为特征的用户身份认证与监控", vol. 31, no. 7 *
陈功: "基于用户鼠标行为的身份认证方法" *
陈功;朱佳俊;施勇;薛质;: "基于用户鼠标行为的身份认证方法", 常州大学学报(自然科学版), vol. 30, no. 02 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112697113A (en) * 2020-12-10 2021-04-23 四川长虹电器股份有限公司 Method for displaying disaster data change situation of mass sensors
CN114244826A (en) * 2022-01-18 2022-03-25 杭州盈高科技有限公司 Webpage identification information sharing method and device, storage medium and processor
CN114244826B (en) * 2022-01-18 2023-11-28 杭州盈高科技有限公司 Webpage identification information sharing method and device, storage medium and processor
CN116957680A (en) * 2023-08-03 2023-10-27 深圳花旦传媒有限公司 Advertisement putting effect monitoring system

Also Published As

Publication number Publication date
CN111400575B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN107438814B (en) Mobile device and method thereof, and method of mobile device emulator
US9756140B2 (en) Tracking user behavior relative to a network page
US9195372B2 (en) Methods, systems, and computer program products for grouping tabbed portion of a display object based on content relationships and user interaction levels
US20150169710A1 (en) Method and apparatus for providing search results
CN111400575B (en) User identification generation method, user identification method and device
WO2018177251A1 (en) Application program processing method, computer device and storage medium
CN108566399B (en) Phishing website identification method and system
US11122142B2 (en) User behavior data processing method and device, and computer-readable storage medium
US20150254219A1 (en) Method and system for injecting content into existing computerized data
US20190362142A1 (en) Electronic form identification using spatial information
KR102111720B1 (en) Method for design recommending using cloud literary work analysis
CN112699295A (en) Webpage content recommendation method and device and computer readable storage medium
CN105243058A (en) Webpage content translation method and electronic apparatus
CN111538931A (en) Big data-based public opinion monitoring method and device, computer equipment and medium
CN104598467B (en) Webpage picture display method and device
US20210097352A1 (en) Training data generating system, training data generating method, and information storage medium
CN115935049A (en) Recommendation processing method and device based on artificial intelligence and electronic equipment
US20140337709A1 (en) Method and apparatus for displaying web page
CN112817817A (en) Buried point information query method and device, computer equipment and storage medium
CN112905921A (en) Page content display method and device
CN111783781B (en) Malicious term recognition method, device and equipment based on product agreement character recognition
CN111339555B (en) Data processing method, device, electronic equipment and storage medium
CN114938458A (en) Object information display method and device, electronic equipment and storage medium
CN114518815A (en) Method, device, equipment, medium and program product for displaying construction drawing
CN115248891A (en) Page display method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026383

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant