CN111680200A - Method, device and equipment for collecting user behavior data and storage medium - Google Patents

Method, device and equipment for collecting user behavior data and storage medium Download PDF

Info

Publication number
CN111680200A
CN111680200A CN202010344482.8A CN202010344482A CN111680200A CN 111680200 A CN111680200 A CN 111680200A CN 202010344482 A CN202010344482 A CN 202010344482A CN 111680200 A CN111680200 A CN 111680200A
Authority
CN
China
Prior art keywords
preset
user behavior
target
nodes
html
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010344482.8A
Other languages
Chinese (zh)
Inventor
聂启成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiante Technology Service Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010344482.8A priority Critical patent/CN111680200A/en
Publication of CN111680200A publication Critical patent/CN111680200A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of big data, is applied to the field of intelligent security and discloses a method for acquiring user behavior data, and the method can be compatible with different HTML (hypertext markup language) pages. The method for acquiring the user behavior data comprises the following steps: when a user operates a browser page, calling a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page; determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes; reading the acquisition requirement of a preset server, and filtering a plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes; and acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface. In addition, the application also relates to a block chain technology, and the user behavior data can be stored in the block chain.

Description

Method, device and equipment for collecting user behavior data and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a method, a device, equipment and a storage medium for acquiring user behavior data.
Background
During the operation process of the product, the user behavior data is used for analyzing the behavior generated by the user on the product and the data behind the behavior. The data of the user behavior is collected, stored, tracked, analyzed and applied, factors, group characteristics and target users for realizing the self-growth of the user can be found, the use scenes, operation rules, access paths, behavior characteristics and the like of the user can be deeply restored, and therefore fine operation is realized, and the service growth is guided.
In the client, in order to improve user experience and count usage situations, the operation behavior of the user in the application using process is generally collected for statistical reporting. In the prior art, most of clients are developed in a hybrid application mode, part of functions can be realized by accessing a hypertext markup language (HTML) page through a built-in browser, active collection of user behavior data cannot be multiplexed on the built-in HTML function, and the requirement on the uniformity of the HTML page is high.
Disclosure of Invention
The invention mainly aims to solve the problems that active acquisition of user behavior data cannot be multiplexed on a built-in HTML function and the requirement on the uniformity of an HTML page is high.
The invention provides a method for acquiring user behavior data in a first aspect, which comprises the following steps: when a user operates a browser page, calling a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page; determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes; reading the acquisition requirement of a preset server, and filtering the plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes; and acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface, wherein the plurality of target behavior acquisition data comprise user behavior identifications and text contents corresponding to the plurality of target HTML nodes.
Optionally, in a first implementation manner of the first aspect of the present invention, the determining, according to the plurality of user behavior information, a plurality of initial HTML nodes in hypertext markup language (HTML), where different pieces of user behavior information correspond to different initial HTML nodes, includes: judging whether the user behavior information is matched with any one node in a plurality of preset event nodes; and if the user behavior information is matched with any one of the preset event nodes, triggering event call-back to obtain a plurality of initial HTML nodes.
Optionally, in a second implementation manner of the first aspect of the present invention, the determining whether the plurality of pieces of user behavior information match any of the plurality of preset event nodes includes: extracting a plurality of user behavior identifications and a plurality of text contents from the plurality of user behavior information, wherein the text contents are used for recording the text information corresponding to the user behavior identifications on the browser page; and aiming at any one user behavior information in the user behavior information, matching the user behavior identifier with a preset standard event identifier in a plurality of preset event nodes by adopting a preset matching algorithm.
Optionally, in a third implementation manner of the first aspect of the present invention, the reading a collection requirement of a preset server, and filtering the multiple initial HTML nodes according to the collection requirement of the preset server to obtain multiple target HTML nodes includes: reading the acquisition requirement of a preset server, and extracting a target acquisition identifier based on the acquisition requirement of the preset server; similarity calculation is carried out on the target acquisition identification and user behavior identifications in the initial HTML nodes respectively to obtain a plurality of node similarities; and setting a plurality of initial HTML nodes with the node similarity larger than or equal to a similarity threshold value as a plurality of target HTML nodes.
Optionally, in a fourth implementation manner of the first aspect of the present invention, before the invoking a preset universal resource file to obtain a plurality of pieces of user behavior information of a user on a browser page when the user operates the browser page, the method for acquiring user behavior data further includes: acquiring a resource file, and injecting the resource file into a browser, wherein the resource file comprises a hypertext markup language (HTML) file and a preset universal resource file; reading a plurality of preset standard events, and performing node registration on the plurality of preset standard events by adopting the preset universal resource file to obtain a plurality of preset event nodes, wherein the plurality of preset event nodes are used for monitoring user behaviors.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the acquiring a resource file and injecting the resource file into a browser, where the resource file includes a hypertext markup language HTML file and a preset universal resource file, includes: acquiring a resource file, and acquiring a dynamic link library and a name of the dynamic link library according to a preset universal resource file in the resource file; allocating a memory for the dynamic link library in a target thread to obtain a target memory, writing the name of the dynamic link library into the target memory by adopting a preset write-in function, and mapping the dynamic link library into the target thread; monitoring the working state of the target thread, and acquiring a target thread value of the target thread when the target thread stops working, wherein the target thread value is a loading base address of the dynamic link library; and writing the target thread value into the dynamic link library.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the reading a plurality of preset standard events, and performing node registration on the plurality of preset standard events by using the preset universal resource file to obtain a plurality of preset event nodes, where the plurality of preset event nodes are configured to monitor user behavior and include: acquiring a plurality of preset standard events, and reading a plurality of preset standard event attributes including a plurality of preset standard event identifications and a plurality of preset text contents from the plurality of preset standard events, wherein the plurality of preset text contents are used for recording a plurality of preset text information corresponding to the plurality of preset standard event identifications on the browser page; and setting the preset standard events in a preset document tree corresponding to the HTML file according to the preset standard event identifications and the preset text contents to obtain a plurality of preset event nodes.
The second aspect of the present invention provides a device for collecting user behavior data, including: the behavior information acquisition module is used for calling a preset universal resource file to acquire a plurality of user behavior information of a user on a browser page when the user operates the browser page; the node acquisition module is used for determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes; the node filtering module is used for reading the acquisition requirements of a preset server and filtering the initial HTML nodes according to the acquisition requirements of the preset server to obtain a plurality of target HTML nodes; and the behavior data acquisition module acquires a plurality of target behavior acquisition data according to the target HTML nodes and reports the target behavior acquisition data to a preset server through a preset interface, wherein the target behavior acquisition data comprises user behavior identifications and text contents corresponding to the target HTML nodes.
Optionally, in a first implementation manner of the second aspect of the present invention, the node obtaining module specifically includes: the judging unit is used for judging whether the user behavior information is matched with any one node in the preset event nodes; and the node acquisition unit is used for triggering event callback to obtain a plurality of initial HTML nodes if the plurality of user behavior information are matched with any one of the plurality of preset event nodes.
Optionally, in a second implementation manner of the second aspect of the present invention, the determining units are collectively configured to: extracting a plurality of user behavior identifications and a plurality of text contents from the plurality of user behavior information, wherein the text contents are used for recording the text information corresponding to the user behavior identifications on the browser page; aiming at any user behavior information in the user behavior information, judging whether any user behavior identifier in the user behavior identifiers is matched with any preset standard event identifier in a plurality of preset event nodes by adopting a preset matching algorithm; and if any user behavior identifier in the plurality of user behavior identifiers is matched with any preset standard event identifier in the plurality of preset event nodes, judging that the plurality of user behavior information is matched with any node in the plurality of preset event nodes.
Optionally, in a third implementation manner of the second aspect of the present invention, the node filtering module is further specifically configured to: reading the acquisition requirements of a preset server, and respectively extracting target acquisition identifiers based on the acquisition requirements of the preset server; similarity calculation is carried out on the basis of the target acquisition identifier and user behavior identifiers in the plurality of initial HTML nodes, and a plurality of node similarities are obtained; and setting a plurality of initial HTML nodes with the node similarity larger than or equal to a similarity threshold value as a plurality of target HTML nodes.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the apparatus for acquiring user behavior data further includes: the resource file registration module is used for acquiring resource files and injecting the resource files into a browser, wherein the resource files comprise hypertext markup language (HTML) files and preset universal resource files; and the node registration module is used for reading a plurality of preset standard events and performing node registration on the plurality of preset standard events by adopting the preset universal resource file to obtain a plurality of preset event nodes, and the plurality of preset event nodes are used for monitoring user behaviors.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the resource file registration module is further specifically configured to: acquiring a resource file, and acquiring a dynamic link library and a name of the dynamic link library according to a preset universal resource file in the resource file; allocating a memory for the dynamic link library in a target thread to obtain a target memory, writing the name of the dynamic link library into the target memory by adopting a preset write-in function, and mapping the dynamic link library into the target thread; monitoring the working state of the target thread, and acquiring a target thread value of the target thread when the target thread stops working, wherein the target thread value is a loading base address of the dynamic link library; and writing the target thread value into the dynamic link library.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the node registration module is further specifically configured to: acquiring a plurality of preset standard events, and reading a plurality of preset standard event attributes including a plurality of preset standard event identifications and a plurality of preset text contents from the plurality of preset standard events, wherein the plurality of preset text contents are used for recording a plurality of preset text information corresponding to the plurality of preset standard event identifications on the browser page; and setting the preset standard events in a preset document tree corresponding to the HTML file according to the preset standard event identifications and the preset text contents to obtain a plurality of preset event nodes.
A third aspect of the present invention provides a device for collecting user behavior data, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor calls the instructions in the memory to enable the user behavior data acquisition device to execute the user behavior data acquisition method.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-mentioned method for collecting user behavior data.
In the technical scheme provided by the invention, when a user operates a browser page, a preset universal resource file is called to acquire a plurality of user behavior information of the user on the browser page; determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes; reading the acquisition requirement of a preset server, and filtering the plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes; and acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface, wherein the plurality of target behavior acquisition data comprise user behavior identifications and text contents corresponding to the plurality of target HTML nodes. In the embodiment of the invention, the universal resource file is called to collect the user behavior data, different HTML pages can be compatible, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for collecting user behavior data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a method for collecting user behavior data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a device for acquiring user behavior data according to an embodiment of the present invention;
fig. 4 is a schematic diagram of another embodiment of a device for acquiring user behavior data according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a device for acquiring user behavior data in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for collecting user behavior data. User behavior data is an important index for macro-economic research and analysis. The method and the device are used for registering the preset event node by calling the universal resource file and monitoring the user behavior through the preset event node, can be compatible with different HTML pages, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the method for acquiring user behavior data in the embodiment of the present invention includes:
101. when a user operates a browser page, calling a plurality of user behavior information of a preset universal resource file user on the browser page;
it should be noted that the preset universal resource file is a universal JavaScript script file, the script for collecting information is actually a JavaScript file added to the page, the script for collecting information and the service logic code of the page are both written in the JavaScript script separately, and one page can run a plurality of JavaScript scripts.
When a user operates a browser page, a terminal acquires a plurality of user behavior information through a universal JavaScript file, the user behavior information comprises behavior events such as clicking, inputting and rolling, the user behavior information also comprises text information corresponding to the behavior events, for example, if the behavior events are news entries, the user behavior information is the behavior events of the news entries, and also comprises the text information corresponding to the news entries.
It is to be understood that the execution subject of the present invention may be a device for acquiring user behavior data, and may also be a terminal or a terminal, which is not limited herein. The embodiment of the present invention is described by taking a terminal as an execution subject.
102. Determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes;
when a user operates on a browser page, a plurality of behavior events are triggered, a terminal acquires a plurality of user behavior information, the terminal monitors whether the plurality of user behavior information is matched with a preset event node, the preset event node is a standard event type defined by the terminal through the browser, and if the plurality of user behavior data is matched with the preset event node, an event callback is triggered to obtain an HTML node.
It should be noted that the terminal registers the standard event type in the document tree of the HTML file in advance to obtain a plurality of element nodes, and the element nodes are preset event nodes.
For example, when the user behavior information is clicking sports news, the terminal can acquire an HTML node of the clicked sports news; when the user behavior information is the click science and technology news, the terminal can acquire the HTML node of the click science and technology news.
103. Reading the acquisition requirement of a preset server, and filtering a plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes;
in the process of collecting user behavior data, a terminal firstly acquires the collection requirement of a preset server and then filters a plurality of initial HTML nodes according to the collection requirement of the preset server; the essence of the filtering can be understood as collecting the HTML nodes with the highest node similarity.
For example, the initial HTML nodes are an initial HTML node a, an initial HTML node B, an initial HTML node C and an initial HTML node D, and only the initial HTML node a, the initial HTML node C and the acquisition requirements meeting the preset server in the initial HTML nodes, the terminal determines that the initial HTML node a and the initial HTML node C are target HTML nodes.
It should be noted that, in this embodiment, the node similarity may be obtained through a cosine similarity algorithm, or the similarity may be obtained through an euclidean distance formula, and the method for obtaining the node similarity is not limited in this embodiment.
104. And acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface, wherein the plurality of target behavior acquisition data comprise user behavior identifications and 2 text contents corresponding to the plurality of target HTML nodes.
The client side obtains a plurality of target collection data according to the plurality of target HTML nodes, the target collection data can be operation events of a user, text types viewed by the user, page titles used by the user, page links used by the user, page tags used by the user and the like, and finally the terminal uploads the obtained plurality of target collection data to a preset server through a preset API interface.
It should be noted that the target behavior collection data is composed of user behavior identifiers and text contents corresponding to target HTML nodes.
It is emphasized that, to further ensure the privacy and security of the user behavior data, the user behavior data may also be stored in a node of a blockchain.
In the embodiment of the invention, the universal resource file is called to collect the user behavior data, different HTML pages can be compatible, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
Referring to fig. 2, another embodiment of the method for collecting user behavior data according to the embodiment of the present invention includes:
201. acquiring a resource file, and injecting the resource file into a browser, wherein the resource file comprises a hypertext markup language (HTML) file and a preset universal resource file;
the terminal acquires a resource file comprising a hypertext markup language (HTML) file and a preset universal resource file, and injects the resource file into the browser.
Specifically, the terminal acquires a dynamic link library and the name of the dynamic link library according to a preset universal resource file; secondly, the terminal allocates a memory for the dynamic link library in the target thread to obtain a target memory which can write the name of the dynamic link library by adopting a preset write-in function, and maps the dynamic link library into the target thread; then when the terminal monitors that the target thread stops working, the thread value of the target thread is obtained as the target thread value of the loading base address of the dynamic link library; and finally, the terminal writes the target thread value into the dynamic link library.
It should be noted that the resource file is a code fragment, and injecting the resource file may be understood as injecting code, where injecting code is a technology for inserting and running independent running code into a target process, and generally calls an API to run the inserted code in the form of a remote thread, which is also referred to as thread injection. The code is inserted in the form of a thread process, and the data used in the code is inserted in the form of thread parameters, i.e. the code and data are injected separately. A Dynamic Link Library (DLL), which is a library of code and data that can be used by multiple programs simultaneously. For example, in the Windows operating system, comdlg32.dll executes common functions related to dialog boxes. Thus, each program can use the functionality contained in the DLL to implement an "open" dialog box. This helps to avoid code reuse and promotes efficient use of memory. By using DLLs, programs can be made modular, consisting of relatively independent components. For example, a billing program may be sold in modules. The various modules may be loaded into the main program at runtime (if the corresponding modules are installed). Because the modules are independent of each other, the program is loaded faster and the modules are loaded only when the corresponding functions are requested.
202. Reading a plurality of preset standard events, and performing node registration on the plurality of preset standard events by adopting a preset universal resource file to obtain a plurality of preset event nodes, wherein the plurality of preset event nodes are used for monitoring user behaviors;
the terminal reads the multiple preset standard events, and node registration is carried out on the multiple preset standard events by adopting the preset universal resource file, so that multiple preset event nodes for monitoring user behaviors are obtained.
Specifically, the terminal acquires a plurality of preset standard event attributes comprising a plurality of preset standard event identifications and a plurality of preset text contents from a plurality of preset standard events, and the plurality of preset text contents record a plurality of preset text messages corresponding to the plurality of preset standard event identifications on a browser page; and the terminal sets the preset standard events in a preset document tree corresponding to the HTML file according to the preset standard event identifications and the preset text contents to obtain a plurality of preset event nodes.
The HTML file only records text content and event types in a document tree form, the event types are distinguished by the difference of event identifiers, events such as clicking, inputting and rolling are all event types, and pages entering after the events such as clicking, inputting and rolling are triggered are text content. The client acquires a plurality of preset standard events, acquires preset standard text contents and preset standard event identifications according to the plurality of preset standard events, and sets different preset text contents and different preset event identifications in element nodes at different positions of a preset document tree so as to acquire a plurality of preset event nodes. When a user operates on a browser page, an event triggered by the user is matched with a preset event node, if the event is matched with the preset event node, a corresponding event stream is triggered, the terminal can obtain a corresponding preset standard event identifier and corresponding preset text information, and target behavior acquisition data can be read according to the corresponding preset standard event identifier and the corresponding preset text information.
203. When a user operates a browser page, calling a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page;
it should be noted that the preset universal resource file is a universal JavaScript script file, the script for collecting information is actually a JavaScript file added to the page, the script for collecting information and the service logic code of the page are both written in the JavaScript script separately, and one page can run a plurality of JavaScript scripts.
When a user operates a browser page, a terminal acquires a plurality of user behavior information through a universal JavaScript file, the user behavior information comprises behavior events such as clicking, inputting and rolling, the user behavior information also comprises text information corresponding to the behavior events, for example, if the behavior events are news entries, the user behavior information is the behavior events of the news entries, and also comprises the text information corresponding to the news entries.
204. Determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes;
when a user operates on a browser page, a plurality of behavior events are triggered, a terminal acquires a plurality of user behavior information, the terminal monitors whether the plurality of user behavior information is matched with a preset event node, the preset event node is a standard event type defined by the terminal through the browser, and if the plurality of user behavior data is matched with the preset event node, an event callback is triggered to obtain an HTML node.
It should be noted that the terminal registers the standard event type in the document tree of the HTML file in advance to obtain a plurality of element nodes, and the element nodes are preset event nodes.
Specifically, the terminal judges whether a plurality of user behavior information are matched with a plurality of preset event nodes; and if the terminal judges that the plurality of user behavior data are matched with the plurality of preset event nodes, calling back by triggering events to obtain a plurality of initial HTML nodes.
The more specific steps that the terminal judges whether the plurality of user behavior information are matched with the plurality of preset event nodes are as follows: the terminal extracts a plurality of user behavior identifications and a plurality of text contents from the plurality of user behavior information, and the plurality of text contents record the text information corresponding to the plurality of user behavior identifications on the browser page; aiming at each user behavior information in the plurality of user behavior information, the terminal adopts a preset matching algorithm to match each user behavior identifier with a preset standard event identifier in a plurality of preset event nodes; and if any user behavior identifier in the plurality of user behavior identifiers is matched with any preset standard event identifier in the plurality of preset event nodes, the server judges that the plurality of user behavior information is matched with any node in the plurality of preset event nodes.
For ease of understanding, the following description is made in conjunction with a specific usage scenario:
when a user browses a current news page, triggering and clicking an event of sports news, acquiring behavior information of the user clicking the sports news, when the user triggers and clicks the event of science news, acquiring the behavior information of the user clicking the sports news, judging whether the behavior information of the user clicking the sports news, the behavior information of the user clicking the science news and whether the behavior information of the user clicking the science news is matched with one of a plurality of preset event nodes or not by a terminal, and if the behavior information of the user clicking the sports news is matched with one of the preset event nodes, triggering event callback to obtain an initial HTML node of the user clicking the sports news; and if the behavior information of the user clicking the scientific and technological news is matched with one preset event node in the plurality of preset event nodes, triggering event call-back to obtain an initial HTML node of the user clicking the scientific and technological news.
205. Reading the acquisition requirement of a preset server, and filtering a plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes;
in the process of collecting user behavior data, a terminal firstly acquires the collection requirement of a preset server and then filters a plurality of initial HTML nodes according to the collection requirement of the preset server; the essence of the filtering can be understood as collecting the HTML nodes with the highest node similarity.
Specifically, firstly, a terminal reads the acquisition requirement of a preset server for extracting a target acquisition identifier; secondly, the terminal calculates similarity by adopting two parameters of a target acquisition identifier and user behavior identifiers in a plurality of initial HTML nodes to obtain similarity of the plurality of nodes; and finally, the terminal sets a plurality of initial HTML nodes with the node similarity larger than or equal to the similarity threshold value as a plurality of target HTML nodes.
For example, assuming that the similarity threshold is 0.9, the acquisition requirement of the preset server is to acquire user behavior data of a clicked scientific and technological news entry, and the terminal acquires a target acquisition identifier of the clicked scientific and technological news according to the acquisition requirement of the preset server. And the terminal matches a target acquisition identifier of the clicked scientific and technological news with a plurality of user behavior identifiers to obtain an initial HTML node A, an initial HTML node B and an initial HTML node C, wherein the node similarity is 0.8, 0.85 and 0.95 respectively, and then the initial HTML node C is determined to be a target HTML node.
It should be noted that, in this embodiment, the node similarity may be obtained through a cosine similarity algorithm, or the similarity may be obtained through an euclidean distance formula, and the method for obtaining the node similarity is not limited in this embodiment.
206. And acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface, wherein the plurality of target behavior acquisition data comprise user behavior identifications and 2 text contents corresponding to the plurality of target HTML nodes.
The client side obtains a plurality of target collection data according to the plurality of target HTML nodes, the target collection data can be operation events of a user, text types viewed by the user, page titles used by the user, page links used by the user, page tags used by the user and the like, and finally the terminal uploads the obtained plurality of target collection data to a preset server through a preset API interface.
In the embodiment of the invention, the universal resource file is called to collect the user behavior data, different HTML pages can be compatible, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
In the above description of the method for acquiring user behavior data in the embodiment of the present invention, referring to fig. 3, an embodiment of an apparatus for acquiring user behavior data in the embodiment of the present invention is described below, where:
the behavior information acquiring module 301 is configured to, when a user operates a browser page, call a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page;
the node obtaining module 302 is configured to determine a plurality of initial HTML nodes according to a plurality of user behavior information, where different user behavior information corresponds to different initial HTML nodes;
the node filtering module 303 is configured to read the acquisition requirements of the preset server, and filter the multiple initial HTML nodes according to the acquisition requirements of the preset server to obtain multiple target HTML nodes;
the behavior data acquisition module 304 is configured to acquire a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and report the plurality of target behavior acquisition data to the preset server through the preset interface, where the plurality of target behavior acquisition data includes user behavior identifiers and text contents corresponding to the plurality of target HTML nodes.
In the embodiment of the invention, the universal resource file is called to collect the user behavior data, different HTML pages can be compatible, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
Referring to fig. 4, another embodiment of the device for acquiring user behavior data according to the embodiment of the present invention includes:
the behavior information acquiring module 301 is configured to, when a user operates a browser page, call a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page;
the node obtaining module 302 is configured to determine a plurality of initial HTML nodes according to a plurality of user behavior information, where different user behavior information corresponds to different initial HTML nodes;
the node filtering module 303 is configured to read the acquisition requirements of the preset server, and filter the multiple initial HTML nodes according to the acquisition requirements of the preset server to obtain multiple target HTML nodes;
the behavior data acquisition module 304 is configured to acquire a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and report the plurality of target behavior acquisition data to the preset server through the preset interface, where the plurality of target behavior acquisition data includes user behavior identifiers and text contents corresponding to the plurality of target HTML nodes.
Optionally, the node obtaining module 302 specifically includes:
a judging unit 3021 configured to judge whether the plurality of pieces of user behavior information match any one of the plurality of preset event nodes;
the node obtaining unit 3022 is configured to trigger event callback if the plurality of user behavior information matches any node of the plurality of preset event nodes, so as to obtain a plurality of initial HTML nodes.
Optionally, the judging unit 3021 may be further specifically configured to:
extracting a plurality of user behavior identifications and a plurality of text contents from the plurality of user behavior information, wherein the text contents are used for recording the text information corresponding to the user behavior identifications on a browser page;
aiming at any user behavior information in the user behavior information, judging whether any user behavior identifier in the user behavior identifiers is matched with any preset standard event identifier in the preset event nodes by adopting a preset matching algorithm;
and if any user behavior identifier in the plurality of user behavior identifiers is matched with any preset standard event identifier in the plurality of preset event nodes, judging that the plurality of user behavior information is matched with any node in the plurality of preset event nodes.
Optionally, the node filtering module 303 may be further specifically configured to:
reading the acquisition requirement of a preset server, and extracting a target acquisition identifier based on the acquisition requirement of the preset server;
similarity calculation is carried out on the target collection identification and user behavior identifications in the multiple initial HTML nodes respectively to obtain multiple node similarities;
and setting a plurality of initial HTML nodes with the node similarity larger than or equal to the similarity threshold value as a plurality of target HTML nodes.
Optionally, the device for acquiring user behavior data further includes:
a resource file registration module 305, configured to acquire a resource file, and inject the resource file into a browser, where the resource file includes a hypertext markup language HTML file and a preset universal resource file;
the node registration module 306 is configured to read a plurality of preset standard events, and perform node registration on the plurality of preset standard events by using a preset universal resource file to obtain a plurality of preset event nodes, where the plurality of preset event nodes are used to monitor user behaviors.
Optionally, the resource file registration module 305 may be further specifically configured to:
acquiring a resource file, and acquiring a dynamic link library and the name of the dynamic link library according to a preset universal resource file in the resource file;
allocating a memory for the dynamic link library in the target thread to obtain a target memory, writing the name of the dynamic link library into the target memory by adopting a preset write-in function, and mapping the dynamic link library into the target thread;
monitoring the working state of a target thread, and acquiring a target thread value of the target thread when the target thread stops working, wherein the target thread value is a loading base address of a dynamic link library; and writing the target thread value into the dynamic link library.
Optionally, the node registration module 306 may be further specifically configured to:
the method comprises the steps of obtaining a plurality of preset standard events, reading a plurality of preset standard event attributes from the plurality of preset standard events, wherein the plurality of preset standard event attributes comprise a plurality of preset standard event identifications and a plurality of preset text contents, and the plurality of preset text contents are used for recording a plurality of preset text information corresponding to the plurality of preset standard event identifications on a browser page;
and setting the preset standard events in a preset document tree corresponding to the HTML file according to the preset standard event identifications and the preset text contents to obtain a plurality of preset event nodes.
It is emphasized that, to further ensure the privacy and security of the user behavior data, the user behavior data may also be stored in a node of a blockchain.
In the embodiment of the invention, the universal resource file is called to collect the user behavior data, different HTML pages can be compatible, and the resource file is separated from the browser, so that the service logic cannot be influenced even if the resource file is updated or modified.
Fig. 3 and 4 describe the apparatus for acquiring user behavior data in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the apparatus for acquiring user behavior data in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a device for acquiring user behavior data, according to an embodiment of the present invention, where the device 500 for acquiring user behavior data may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on storage medium 530 may include one or more modules (not shown), each of which may include a sequence of instructions operating on XXX device 500. Still further, processor 510 may be configured to communicate with storage medium 530 to execute a series of instruction operations in storage medium 530 on XXXX device 500.
The user behavior data collection device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the XXXX device architecture shown in fig. 5 does not constitute a limitation to the acquisition device of user behavior data, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and may also be a volatile computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the steps of the method for collecting user behavior data.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for collecting user behavior data is characterized by comprising the following steps:
when a user operates a browser page, calling a preset universal resource file to acquire a plurality of user behavior information of the user on the browser page;
determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes;
reading the acquisition requirement of a preset server, and filtering the plurality of initial HTML nodes according to the acquisition requirement of the preset server to obtain a plurality of target HTML nodes;
and acquiring a plurality of target behavior acquisition data according to the plurality of target HTML nodes, and reporting the plurality of target behavior acquisition data to a preset server through a preset interface, wherein the plurality of target behavior acquisition data comprise user behavior identifications and text contents corresponding to the plurality of target HTML nodes.
2. The method of claim 1, wherein the determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes includes:
judging whether the user behavior information is matched with any one node in a plurality of preset event nodes;
and if the user behavior information is matched with any one of the preset event nodes, triggering event call-back to obtain a plurality of initial HTML nodes.
3. The method of claim 2, wherein the determining whether the user behavior information matches any of a plurality of pre-set event nodes comprises:
extracting a plurality of user behavior identifications and a plurality of text contents from the plurality of user behavior information, wherein the text contents are used for recording the text information corresponding to the user behavior identifications on the browser page;
aiming at any user behavior information in the user behavior information, judging whether any user behavior identifier in the user behavior identifiers is matched with any preset standard event identifier in a plurality of preset event nodes by adopting a preset matching algorithm;
and if any user behavior identifier in the plurality of user behavior identifiers is matched with any preset standard event identifier in the plurality of preset event nodes, judging that the plurality of user behavior information is matched with any node in the plurality of preset event nodes.
4. The method of claim 1, wherein the reading of the collection requirements of a preset server and the filtering of the initial HTML nodes according to the collection requirements of the preset server to obtain target HTML nodes comprises:
reading the acquisition requirement of a preset server, and extracting a target acquisition identifier based on the acquisition requirement of the preset server;
similarity calculation is carried out on the target acquisition identification and user behavior identifications in the initial HTML nodes respectively to obtain a plurality of node similarities;
and setting a plurality of initial HTML nodes with the node similarity larger than or equal to a similarity threshold value as a plurality of target HTML nodes.
5. The method for collecting user behavior data according to any one of claims 1 to 4, wherein before the invoking a preset universal resource file to obtain a plurality of user behavior information of a user on a browser page when the user operates the browser page, the method for collecting user behavior data further comprises:
acquiring a resource file, and injecting the resource file into a browser, wherein the resource file comprises a hypertext markup language (HTML) file and a preset universal resource file;
reading a plurality of preset standard events, and performing node registration on the plurality of preset standard events by adopting the preset universal resource file to obtain a plurality of preset event nodes, wherein the plurality of preset event nodes are used for monitoring user behaviors.
6. The method of claim 5, wherein the obtaining a resource file and injecting the resource file into a browser, the resource file comprising a hypertext markup language (HTML) file and a predetermined universal resource file comprises:
acquiring a resource file, and acquiring a dynamic link library and a name of the dynamic link library according to a preset universal resource file in the resource file;
allocating a memory for the dynamic link library in a target thread to obtain a target memory, writing the name of the dynamic link library into the target memory by adopting a preset write-in function, and mapping the dynamic link library into the target thread;
monitoring the working state of the target thread, and acquiring a target thread value of the target thread when the target thread stops working, wherein the target thread value is a loading base address of the dynamic link library;
and writing the target thread value into the dynamic link library.
7. The method according to claim 5, wherein the reading of the plurality of preset standard events and the node registration of the plurality of preset standard events by using the preset universal resource file obtain a plurality of preset event nodes, and the monitoring of the user behavior by the plurality of preset event nodes comprises:
acquiring a plurality of preset standard events, and reading a plurality of preset standard event attributes including a plurality of preset standard event identifications and a plurality of preset text contents from the plurality of preset standard events, wherein the plurality of preset text contents are used for recording a plurality of preset text information corresponding to the plurality of preset standard event identifications on the browser page;
and setting the preset standard events in a preset document tree corresponding to the HTML file according to the preset standard event identifications and the preset text contents to obtain a plurality of preset event nodes.
8. An apparatus for collecting user behavior data, the apparatus comprising:
the behavior information acquisition module is used for calling a preset universal resource file to acquire a plurality of user behavior information of a user on a browser page when the user operates the browser page;
the node acquisition module is used for determining a plurality of initial HTML nodes according to the user behavior information, wherein different user behavior information corresponds to different initial HTML nodes;
the node filtering module is used for reading the acquisition requirements of a preset server and filtering the initial HTML nodes according to the acquisition requirements of the preset server to obtain a plurality of target HTML nodes;
and the behavior data acquisition module acquires a plurality of target behavior acquisition data according to the target HTML nodes and reports the target behavior acquisition data to a preset server through a preset interface, wherein the target behavior acquisition data comprises user behavior identifications and text contents corresponding to the target HTML nodes.
9. An apparatus for collecting user behavior data, the apparatus comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the user behavior data collection device to perform the user behavior data collection method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for collecting user behavior data according to any one of claims 1 to 7.
CN202010344482.8A 2020-04-27 2020-04-27 Method, device and equipment for collecting user behavior data and storage medium Pending CN111680200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010344482.8A CN111680200A (en) 2020-04-27 2020-04-27 Method, device and equipment for collecting user behavior data and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010344482.8A CN111680200A (en) 2020-04-27 2020-04-27 Method, device and equipment for collecting user behavior data and storage medium

Publications (1)

Publication Number Publication Date
CN111680200A true CN111680200A (en) 2020-09-18

Family

ID=72452172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010344482.8A Pending CN111680200A (en) 2020-04-27 2020-04-27 Method, device and equipment for collecting user behavior data and storage medium

Country Status (1)

Country Link
CN (1) CN111680200A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380473A (en) * 2020-11-16 2021-02-19 康键信息技术(深圳)有限公司 Data acquisition and synchronization method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246661A (en) * 2012-02-07 2013-08-14 阿里巴巴集团控股有限公司 Visual user behavior collecting system and method
CN103279567A (en) * 2013-06-18 2013-09-04 重庆邮电大学 Web data collection method and system both based on AJAX (asynchronous javascript and extensible markup language)
CN105094786A (en) * 2014-05-21 2015-11-25 广州市动景计算机科技有限公司 Method and system for customizing page based on JavaScript
CN109948077A (en) * 2018-08-20 2019-06-28 平安普惠企业管理有限公司 User behavior data acquisition method, device, equipment and computer storage medium
CN110515679A (en) * 2019-08-28 2019-11-29 北京思维造物信息科技股份有限公司 Collecting method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246661A (en) * 2012-02-07 2013-08-14 阿里巴巴集团控股有限公司 Visual user behavior collecting system and method
CN103279567A (en) * 2013-06-18 2013-09-04 重庆邮电大学 Web data collection method and system both based on AJAX (asynchronous javascript and extensible markup language)
CN105094786A (en) * 2014-05-21 2015-11-25 广州市动景计算机科技有限公司 Method and system for customizing page based on JavaScript
CN109948077A (en) * 2018-08-20 2019-06-28 平安普惠企业管理有限公司 User behavior data acquisition method, device, equipment and computer storage medium
CN110515679A (en) * 2019-08-28 2019-11-29 北京思维造物信息科技股份有限公司 Collecting method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380473A (en) * 2020-11-16 2021-02-19 康键信息技术(深圳)有限公司 Data acquisition and synchronization method, device, equipment and storage medium
CN112380473B (en) * 2020-11-16 2023-10-20 康键信息技术(深圳)有限公司 Data acquisition and synchronization method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101345751B (en) Identifying application user as source of database activity
US9485317B2 (en) Method and system for monitoring execution of user request in distributed system
Munk et al. Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor
US9519561B2 (en) Method and system for configuration-controlled instrumentation of application programs
US20100088197A1 (en) Systems and methods for generating remote system inventory capable of differential update reports
US20170161306A1 (en) Method and apparatus for data quality management and control
CN106295382B (en) A kind of Information Risk preventing control method and device
CN110020339B (en) Webpage data acquisition method and device based on non-buried point
CN111901192A (en) Statistical method and device for page access data
CN109284435B (en) Internet-oriented user interaction trace capturing, storing and retrieving system and method
CN112347501A (en) Data processing method, device, equipment and storage medium
CN110941632A (en) Database auditing method, device and equipment
CN114422564A (en) Audit tracing method and device for access data, computer equipment and storage medium
CN111651656A (en) Method and system for dynamic webpage crawler based on agent mode
Chitraa et al. An efficient path completion technique for web log mining
CN114971714A (en) Accurate customer operation method based on big data label and computer equipment
CN110941530A (en) Method and device for acquiring monitoring data, computer equipment and storage medium
CN114546756A (en) Method and system for monitoring link data in micro-service architecture system
CN114490554A (en) Data synchronization method and device, electronic equipment and storage medium
CN111680200A (en) Method, device and equipment for collecting user behavior data and storage medium
CN112671878B (en) Block chain information subscription method, device, server and storage medium
CN113360210A (en) Data reconciliation method and device, computer equipment and storage medium
CN112256532A (en) Test interface generation method and device, computer equipment and readable storage medium
US9977836B2 (en) Storing method and apparatus for data acquisition
CN113923190B (en) Equipment identification jump identification method and device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right

Effective date of registration: 20210222

Address after: Room 201, building a, No.1 Qianwan 1st Road, Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong Province (settled in Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen saiante Technology Service Co.,Ltd.

Address before: 1-34 / F, Qianhai free trade building, 3048 Xinghai Avenue, Mawan, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong 518000

Applicant before: Ping An International Smart City Technology Co.,Ltd.

TA01 Transfer of patent application right
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination