CN114003567A - Data acquisition method and related device - Google Patents

Data acquisition method and related device Download PDF

Info

Publication number
CN114003567A
CN114003567A CN202111195332.6A CN202111195332A CN114003567A CN 114003567 A CN114003567 A CN 114003567A CN 202111195332 A CN202111195332 A CN 202111195332A CN 114003567 A CN114003567 A CN 114003567A
Authority
CN
China
Prior art keywords
data
user
event
trigger event
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111195332.6A
Other languages
Chinese (zh)
Inventor
华文尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ideamake Software Technology Co Ltd
Original Assignee
Shenzhen Ideamake Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ideamake Software Technology Co Ltd filed Critical Shenzhen Ideamake Software Technology Co Ltd
Priority to CN202111195332.6A priority Critical patent/CN114003567A/en
Publication of CN114003567A publication Critical patent/CN114003567A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a data acquisition method and a related device, wherein the method comprises the following steps: monitoring the business operation executed by a target terminal; if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition; and sending the user data to a database of a server. By adopting the method of the embodiment of the application, the data acquisition points can be dynamically managed, the simultaneous online real-time data acquisition of a large number of users can be realized, the extraction of user characteristics can be realized based on the data of user behaviors, and the user service risk control can be carried out.

Description

Data acquisition method and related device
Technical Field
The present application relates to the field of data processing and analysis, and more particularly, to a data acquisition method and related apparatus.
Background
Most of client data acquisition schemes at the present stage are intrusive and actively reported, that is, a public scheme component is adopted at the front end, a background interface is called at each function point, and the current page and resource attributes are reported; another existing solution is page element based event bubbling: and the acquisition processing terminal develops web application by using a back-end language such as java or php and the like, writes the messages and respectively writes the messages into the relation table according to the function points.
In the existing scheme, a buried point event is used as a management unit, a uniform service model abstraction does not exist, and the utilization rate of data resources is low. The data cleaning conversion is based on web service processing, the data throughput is low, only a small amount of simultaneous reporting can be met, and the real-time performance of data writing is poor.
Disclosure of Invention
The embodiment of the application provides a data acquisition method and a related device, which aim to acquire client data in real time without buried point data and report the client data to a database of a server, process data with high throughput, and support mass data storage and efficient aggregation and real-time analysis of the data
In a first aspect, an embodiment of the present application provides a data acquisition method, where the method includes:
monitoring the business operation executed by a target terminal;
if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition;
and sending the user data to a database of a server.
In a second aspect, an embodiment of the present application provides an apparatus for data acquisition, where the apparatus includes:
the monitoring unit monitors an event of a target terminal at a data acquisition point, wherein the event is used for indicating a business operation executed by the target terminal, and the acquisition point is a target trigger event expected to be captured from the business operation;
the data acquisition unit is connected with the monitoring unit and is used for acquiring the context data generated on the event;
the data storage unit is connected with the acquisition unit and used for storing the context data acquired by the data acquisition unit, the context data comprises first information and second information, and the first information comprises: user name, age, phone, scene information, etc., the second information including: click behavior information, browse behavior information, and the like;
the data analysis unit is connected with the data storage unit and used for performing label classification model training by using the context data in the data storage unit, and the label classification model extracts the characteristics of the user and can be used for analyzing and controlling the business risk of the user;
and the data query unit is connected with the data storage unit and is used for realizing free query of the context data.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:
one or more processors;
one or more memories for storing programs,
one or more communication interfaces for wireless communication, wherein the memory and the communication interfaces are connected with each other and perform communication work with each other;
the one or more memories and the program are configured to control the apparatus to perform some or all of the steps as described in any of the methods of the first aspect of the embodiments of the application by the one or more processors.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein for electronic data exchange, the computer program comprising executable instructions for performing some or all of the steps as described in any one of the methods of the first aspect of embodiments of the present application.
In a fifth aspect, the present application provides a computer program product, where the computer program product includes a computer program operable to cause a computer to perform some or all of the steps as described in any one of the methods of the first aspect of the embodiments of the present application. The computer program product may be a software installation package.
It can be seen that, in the embodiment of the present application, the service operation executed by the target terminal is monitored; if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition; and sending the user data to a database of a server. By adopting the method of the embodiment of the application, the client data are acquired in real time without buried point data and reported to the database of the server, the data are processed with high throughput, and massive data storage and efficient aggregation and real-time analysis of the data are supported.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a network architecture provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a data acquisition method provided in an embodiment of the present application;
fig. 3 is a schematic specific flowchart of a data acquisition method according to an embodiment of the present application;
fig. 4 is a flow chart of a buried point configuration of a data acquisition method according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of the functional test of the SDK code module provided in the embodiment of the present application;
FIG. 6 is a schematic diagram of a fact hierarchy label classification provided in an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of another data acquisition device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps is not limited to only those steps recited, but may alternatively include other steps not recited, or may alternatively include other steps inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In related design, some schemes use a buried point event as a management unit, a uniform service model abstraction does not exist, and the utilization rate of data resources is low. The data cleaning conversion is based on web service processing, the data throughput is low, only a small amount of simultaneous reporting can be met, and the real-time performance of data writing is poor.
In order to solve the above problems, embodiments of the present application provide a method and an apparatus for data acquisition, which monitor a service operation executed by a target terminal; if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition; and sending the user data to a database of a server. By adopting the method of the embodiment of the application, the client data acquired by the non-buried point data is reported to the database of the server in real time, the data can be processed with high throughput, and massive data storage and efficient aggregation and real-time analysis of the data are supported.
In order to better understand the method and the device for data acquisition disclosed in the embodiments of the present invention, the embodiments of the present invention are described in detail below.
A description will be given below of a network architecture to which the embodiment of the present invention is applicable. Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present invention. As shown in fig. 1, the network architecture diagram may include a service device and a terminal, wherein the service device may include a server, a service host, a service system, a service platform, and the like, and the terminal includes, but is not limited to, a mobile phone, a mobile computer, a tablet computer, a Personal Digital Assistant (PDA), a media player, a smart television, a smart watch, smart glasses, a smart band, and the like. The service equipment can be in communication connection with the terminal through the Internet.
Based on this, please refer to fig. 2, and fig. 2 is a schematic flow chart of a data acquisition method provided in an embodiment of the present application, where the data acquisition method is applied to a service device, and as shown in fig. 2, the method includes the following steps:
step 201: and monitoring the business operation executed by the target terminal.
Illustratively, the target terminal includes, but is not limited to, a mobile phone, a mobile computer, a tablet computer, a Personal Digital Assistant (PDA), a media player, a smart tv, a smart watch, smart glasses, a smart band, and other user devices.
Specifically, the data collection is to collect behavior data of the user, and the collected behavior data is used as a reference data source for extracting user characteristics and controlling user business risk.
The user selects a platform such as an application APP, an applet, or a web page, and the like at the terminal to perform related business operations, specifically, the business operations may include, but are not limited to the following: "click," "slide page," and the like. Among the data fields that may be generated by the "click" operation are: clicking a link, clicking a jump, clicking a location, etc., data fields that may be generated for a "slide page" operation are: up, down, left, right, etc.
Before monitoring a service operation executed by a target terminal, a target trigger event for triggering a monitoring mechanism needs to be set in advance, where the target trigger event includes: clicking an event, wherein the current operation is recorded once when a user clicks a target terminal interface once; an exposure event, wherein when a user enters or refreshes a certain page at a target end, once operation data is recorded; and the page dwell time is used for recording the data of the user, such as the page dwell time, the browsing operation in the current time period and the like.
Step 202: and if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information, behavior information and scene information, the attribute information is used for describing static characteristics of a user, the behavior information is used for indicating an operation record of the user, and the target trigger event is a preset event for indicating data acquisition.
The target trigger event can be divided into: click event, exposure event and page dwell time.
Specifically, the click event means that data is recorded once when a user clicks a page button of a target terminal once; the exposure event refers to data when a user enters a page or refreshes the page, and the data is not recorded when the user exits the page; the page stay time refers to the stay time of the user on the page, and is calculated by recording the time of the user entering the page and the time of the user leaving the page.
Illustratively, the monitoring of the operation performed by the user at the target terminal may be performed by setting a monitoring code on the target application, and if a target trigger event of the service operation is monitored, the target trigger event is recorded and stored.
Further, in this embodiment, a method of collecting logs is adopted to collect the user data of the target trigger event.
Illustratively, the method of log collection is implemented by using a synchronous Non-blocking I/O (NIO) model in JAVA, which is an effective way to solve the problem of high concurrency and large number of connections, I/O processing.
Specifically, when the user performs a service operation on the target terminal, the Java NIO deployed in the target application is activated, and the current service operation log is acquired and transmitted by using the code.
Step 203: and sending the user data to a database of a server.
Illustratively, before sending the user data to the server, the data integration of the user data is further included, wherein the integration includes two levels of formal data integration and data integration above.
Illustratively, formal data integration is to solve the problem that different operating systems, databases and programming languages make different definitions of the basic types of user data, resulting in different representations and storage manners of the user data, and that different systems refer to the user data with each other to generate error results.
Semantically data integration illustratively refers to establishing proper mapping and transformation relationships between semantic annotations of user data.
It can be seen that, in the embodiment of the present application, the service operation executed by the target terminal is monitored; if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition; and sending the user data to a database of a server. By adopting the method of the embodiment of the application, the client data acquired by the non-buried point data is reported to the database of the server in real time, the data can be processed with high throughput, and massive data storage and efficient aggregation and real-time analysis of the data are supported.
In one possible example, the collecting user data of the target trigger event includes: reading the user data from the target terminal in real time through a high-throughput distributed publish-subscribe message system kafka; converting the data format and dimensionality of the extracted data according to a pre-designed rule for the user data, so that the originally heterogeneous data formats can be unified; and importing the converted data into the database according to planned increment or all.
Illustratively, kafka is used as a medium, and is communicated with databases or apps of different target terminals to read user data from the target terminals in real time.
Further, the data format and the dimensionality of the extracted user data are converted according to a pre-designed rule, so that the original heterogeneous data formats can be unified; and then, importing the converted data into a database according to a planned increment or all.
Furthermore, the data center receives various types of 'non-buried point data' temporarily stored in the database, and in specific application, data analysis personnel can process and analyze the buried point data in the data center by themselves without redeveloping the data.
It can be seen that, in this embodiment, mass data can be imported into or exported from kafka by using kafka as a medium, and the information is automatically reported to a background database in real time; meanwhile, the data-free data of the database are transmitted to the data center, and data analysis personnel can automatically perform processing, analyzing and other operations on the data of the data center, so that self-service and cloud-end data service are realized.
In a possible example, before the listening to the service operation performed by the target terminal, the method further includes: setting a target trigger event, if the target trigger event is monitored, intercepting the target trigger event, wherein the target trigger event comprises the following steps: clicking event, exposure event and page dwell time; the click event means that data is recorded once when the user clicks the page button of the target terminal once; the exposure event refers to data when the user enters a page or refreshes the page, and the data is not recorded when the user exits the page; the page stay time refers to the stay time of the user on the page, and is calculated by recording the time of the user entering the page and the time of the user leaving the page.
Specifically, during the data acquisition process, the personalized codes do not need to be reported and written at each position where data needs to be acquired, and the single data does not need to be acquired frequently; and meanwhile, more comprehensive event monitoring is supported, and a more comprehensive user data set is obtained.
Illustratively, a monitoring code is deployed in the target application, and a target trigger event list is preset through the monitoring code, so that whether a target trigger event exists when a user performs a business operation on a target terminal is determined. And if the monitoring code monitors the target trigger event, acquiring the trigger information generated on the trigger event, namely the user data.
Illustratively, prior to deploying the listening code, event definitions need to be made, including but not limited to: page chinese name, event english name, trigger action, attribute chinese name, attribute english name, etc.
Further, in this embodiment, a method of collecting logs is adopted to collect user data. The log collection method is realized by adopting a Non-blocking I/O (Non-blocking I/O) model in Java, and the Java-NIO is an effective mode for solving the problems of high concurrency, large amount of connection and I/O processing.
Illustratively, when a user performs a business operation on a target terminal, a Java-NIO deployed in a target application is activated, and a current business operation log is acquired and sent by using the part of the code. Specifically, as shown in fig. 3, fig. 3 is a schematic specific flowchart of a data acquisition method provided in the embodiment of the present application:
step 301: and (5) initializing configuration.
Illustratively, the initialization configuration comprises setting a data acquisition rule of the server, wherein the acquisition rule is used for setting a data acquisition point needing to be specifically acquired in the data acquisition process, so that the data acquisition efficiency is improved.
Illustratively, the user side attribute is used as a context data source, data acquisition is performed for different application scenarios of the user side, and the scenario acquisition of user data is realized. The following table 1 shows the general data source parameter settings built in the user side:
TABLE 1
$APP Data of getApp ()
$DATASET Dataset of event
$EVENT Event of an event
$OPTIONS Options of the current page
$DATA Data of current page or component
$APPOPTIONS Options of app
Specifically, $ APP represents that data of the current user side is obtained; $ DATASET is used to represent a database of events; the EVENT is used for indicating the name of an EVENT, if the corresponding EVENT name is monitored, the monitored object triggers the EVENT, and the monitored object executes the preset code to acquire data; $ OPTIONS represents the OPTIONS of the current client; $ DATA represents the DATA of the current component, $ APPOPTIONS represents the event item that the user triggers at the current user side.
Step 302: a buried point event trigger.
Illustratively, if the event name corresponding to table 1 is monitored, and the monitored object triggers the event, the monitored object executes the preset code to acquire data.
Step 303: and judging whether the buried point index exists or not.
Step 304: an event unique identification is determined.
Step 305: and (6) event interception.
Step 306: and acquiring user data according to the configuration.
Specifically, the user data includes attribute information for describing a static feature of the user and behavior information that describes an operation record of the user by application.
Step 307: and reporting the event.
Illustratively, the obtained user data is reported to an OLAP analysis engine clickhouse database, which has two very prominent advantages of supporting columnar storage and data compression. The query efficiency of clickhouse is faster than that of a general database.
It can be seen that by deploying the monitoring code in the target application, the event collection can be performed on the service operation of the target terminal in a targeted manner, so that the accuracy of data collection without buried points is improved. Meanwhile, by deploying monitoring codes in advance and setting target trigger events, the complexity of code detection when each event occurs is avoided, and the workload of event collection is reduced based on a modularized event interception automatic reporting mechanism. And, the use of the clickhouse database can realize mass storage and rapid query of data.
In one possible example, after the acquiring the user data of the target trigger event, the method further includes: encrypting the user data based on an encryption mechanism of an asymmetric encryption algorithm RSA + triple data encryption algorithm 3 DES; wherein the 3DES is used to encrypt long content and the RSA is used to encrypt keys used by the 3 DES.
Illustratively, before the user data is collected and reported, data encryption is performed by using an asymmetric encryption algorithm such as RSA, so as to obtain encrypted data. The encryption key of the asymmetric encryption algorithm is different from the decryption key, and requires two keys, one is called a public key (public key), i.e., a public key, and the other is called a private key (private key), i.e., a private key. 3DES is a symmetric algorithm based on DES, and data is encrypted three times by three different keys, so that the intensity is higher.
Illustratively, the data is stored after being decrypted by using a specific key at the database end.
It can be seen that, in the embodiment of the application, the acquired user data is encrypted by using different encryption algorithms, so that the information security of the user and the stability of the data transmission can be ensured.
In one possible example, the method further includes: before sending the user data to the database of the server, the method further includes: and performing streaming calculation on the user data, wherein the streaming calculation refers to performing uninterrupted processing on the user data of the event acquired in real time and performing real-time calculation.
Illustratively, streaming computing is an "event-triggered" computing mode, where the triggering source is the user data collected without a buried point as described above. Once new user data enters the streaming computation, the streaming computation is initiated and a computation task is performed immediately, so the entire streaming computation is a continuous computation.
For example, the calculation result of the one-time streaming calculation triggered by the user data may be directly written into the database for storage, for example, the calculated user data is directly written into the user information database for report display. The calculation result of the user data can be continuously written to the destination data store like streaming data.
Therefore, in the embodiment of the application, the real-time data cleaning conversion is efficiently carried out through stream computing, the original acquisition log is converted into the dimensional view and the scene view, and the real-time service analysis is provided.
In a possible example, before the monitoring a target trigger event in the business operation and collecting user data of the target trigger event, the method further includes: and loading a defined Software Development Kit (SDK) code in the service code to realize the non-intrusive interception of the user behavior event, wherein the non-intrusive interception is to capture a key user behavior event at a target trigger event in real time by using the SDK code.
Exemplarily, in the embodiment of the present application, the front-end and back-end acquisition can be supported by implementing non-intrusive event interception in the SDK respectively; and starting a visual embedded point mode in the APP, the applet or the webpage embedded with the SDK, and communicating with the rear end of the server. In particular, as shown in fig. 4, in an application scenario, fig. 4 is a flow chart of a buried point configuration of a data acquisition method.
Step 401: a TAPD demand is received.
Illustratively, the Tencent agile collaboration platform (TAPD) is a collaboration and software development management platform independently developed by Tencent corporation. TAPD provides rich configurable functions such as signboards, online documents, agile demand planning, iterative planning & tracking, task man-hour management, defect tracking management, test planning & cases, continuous integration, continuous delivery & deployment and the like, meets the requirements of different client scenes, and can help enterprises to efficiently cooperate and improve research and development efficiency. Specifically, a developer first receives a TAPD requirement document from a product manager.
Step 402: and supplementing the buried point table according to the product field.
Illustratively, according to the product fields in the requirement document, checking original information of the embedded point table, wherein existing fields in the embedded point table can be continuously used, and the first appearing fields need to be correspondingly customized and supplemented into the embedded point table.
Step 403: and entering a custom field and a template of the event.
Step 404: and (5) associating the test applet.
Specifically, after the test applet is associated, it is necessary to determine whether the requirements are satisfied.
Illustratively, the modified SDK code module is functionally tested.
Specifically, whether the reported data is accurate or not can be checked in the data center through triggering event reporting. And if the current requirements are met, automatically reporting the data. If not, the test is continued to be modified until the requirements are met.
In one possible embodiment, the modified product is delivered to a tester for relevant functional testing. As shown in fig. 5, fig. 5 is a schematic flow chart of the tester for the functional test of the modified SDK code module:
step 501: and (5) testing the environment.
Specifically, in the environment test link, it is necessary to detect whether the data is normally reported.
Specifically, the current environment of the tester is configured to be the same as the application environment of the target terminal, and the user is simulated to perform related business operations. And according to whether the data collected by the system can be normally reported to a background database: if the current test environment can be reported normally, marking that the current test environment passes; and if the report can not be normally reported, creating or modifying a defect work order on the TAPD, and submitting the defect work order to a technical staff for further modification and perfection.
Step 502: the environmental test passed.
Further, after the environment test is passed, it is necessary to determine whether the applet needs to be reviewed.
Specifically, if necessary, the applet is checked; and if not, checking and accepting the production environment.
Specifically, in practical application, before the applet is put into production, the applet generally needs to submit a relevant technical document to be checked and passed before being put on line. And the applet evaluation provides better service and more excellent user experience for encouraging developers to make the applet better. The data condition of the small program is comprehensively evaluated through operation, performance and user indexes, and the function experience condition of the small program is evaluated through manual examination, so that comprehensive evaluation is finally obtained, and online can be carried out. Therefore, the tester needs to ensure the basic functional module of the applet to be tested before submission.
It can be seen that in the embodiment of the application, through non-intrusive event interception, it is not necessary to report and write a personalized code at each position of the collected data, and custom fields and templates are performed according to requirements, so that collection points can be dynamically managed, collection strategies can be dynamically added, meanwhile, the data collection process is simplified, the data collection efficiency is improved, and manpower is saved.
In one possible example, the method further comprises the step of extracting the user data of the database for label classification model training, wherein the method comprises the steps of carrying out data segmentation on the user data to obtain a first data set and a second data set, the first data set is a standard data set after labels are manually marked, the second data set is a training data set, the labels are used for indicating the room purchasing intention of the user, and the labels comprise positive labels and negative labels; pre-training the first data set pair to obtain a first model, wherein the first model is a machine learning model with an improvement function; taking the second data set as the input of the first model, carrying out model parameter adjustment and optimization, and determining a prediction label of a user corresponding to the user data in the second data set according to the output result of the first model; and storing the training result of the optimized label classification model, wherein the model can be used for user risk analysis and risk control.
In step 404, the user data reported to the database includes attribute information of the user and behavior information of the user. The behavior information of the user can be extracted from the database according to the requirement and used for training a label classification model, and the label classification model can be one of a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a transformer and other neural networks.
Specifically, the multi-classification task refers to that one piece of data has only one label, but the label has multiple categories. What needs to be clear are two features of the multi-label classification task: the number of the category labels is uncertain, some samples can only have one category label, and some samples can have a plurality of category labels; interdependencies between category labels may exist, for example, facility requirements and site selection requirements of a user may affect the user's intent to buy a house.
Illustratively, the model training results can be used for extracting and analyzing user features, and the extracted and analyzed data can be used by the real estate sales personnel in actual sales. And aiming at different users, providing targeted services.
In one possible embodiment, the user data collected in the database includes basic information of the user, including: name, gender, phone, etc.; the method also includes behavior information of the user, for example, related business operations performed by the user at the APP, applet or web page end, and personal intention information browsing selection is selected at the interface, and various types of scene information are collected during the interaction process, including: the house purchasing intention, the site selection requirement, the facility requirement and the like of the user are researched aiming at the scene information, each type of information corresponds to one label, an all-around evaluation can be obtained aiming at the user, different services are provided aiming at different users, the work flow of the salespersons can be simplified, and the reliability between the salespersons and the customers can be improved.
In particular, a multi-label classification task may be employed. The multi-label classification task means that one piece of user data may have one or more labels, and as described above, scene information can be obtained from each user, and the scene information includes multiple labels such as user house purchasing intention, site selection requirement, facility requirement and the like.
Specifically, before training the label classification model, the currently required label type needs to be specified, and when constructing the label, the setting can be performed according to the logic of the product manager PM, for example: who the product service object is, issues that the product needs to address, etc., for which tag selection is made. Based on this, the labels can be classified as: category hierarchy and fact hierarchy. Wherein, the class hierarchy is mainly based on the business type to perform label division, and the part is specifically completed by the PM. In this embodiment, the focus study object is the label partition of the fact hierarchy.
Illustratively, as shown in fig. 6, fig. 6 is a schematic diagram of a fact hierarchy label classification in this embodiment:
specifically, it can be divided into: the scene label is used as a factual basis for judging the user behavior label; the interest tags serve as basic tags for calculating user preference; and the expansion tags are tags which cannot be directly extracted due to sparse data or insufficient comprehensiveness, such as the consumption capacity of a user, the house purchasing desire and the like.
Specifically, as shown in fig. 6, in the scene tag, the user can select different browsing manners at the target terminal to obtain user data, where the user may optionally include, but not only include: the method comprises the following steps that in the modes of APP, small programs, web pages and the like, the scene type, the scene touch time and the scene stay time are selected in the information browsing process. The scene refers to a series of user data generated by interaction with the system in the process of selection or operation performed by a user at the current time. The interest tag means that when a user selects information to browse, more search or browsing stops are generated when an information point focused on by the user, wherein the content focused on by the user includes but is not limited to: transportation facilities, fitness facilities, schools, parking lots, and the like. The extended tag is a tag which is acquired from other information because data acquired by a system is sparse and cannot fully embody the characteristics of a user when the user operates the system, and includes but not only: asset attributes, revenue capabilities, social circles, and the like.
Specifically, firstly, dividing a user data set into a first data set and a second data set, wherein the first data set is manually marked according to the determined event label; and then, the obtained standard data set is used as the input of a label classification model, a supervised training method is adopted for model pre-training, and a training result is stored as a first model. The supervised training method means that the training data of the model is labeled, and the training target is to give correct labels to new data.
Furthermore, the first model is subjected to unsupervised training by utilizing the second data set, the output result of the first model is the label corresponding to the client in the second data set, and meanwhile, the accuracy of model label classification is improved by adjusting and optimizing the relevant parameters in the training process of the second data set. And storing the output label result into a client label set of a database, and storing the model training result for continuous use next time.
It can be seen that in the embodiment of the application, the training of the user label classification model is performed through the collected user data, and the training result can be used in actual production and sale. The utilization rate of the collected user data is improved, more accurate service can be provided for users with different labels, the working efficiency of the salespersons and the user service risk control can be improved, and the reliability of the customers on the salespersons can be effectively improved.
Referring to fig. 7, in accordance with the embodiment shown in fig. 2, fig. 7 is a schematic structural diagram of a data acquisition device according to an embodiment of the present application, as shown in fig. 7:
a data acquisition device, said device comprising:
701: the monitoring unit monitors an event of a target terminal at a data acquisition point, wherein the event is used for indicating a business operation executed by the target terminal, and the acquisition point is a target trigger event expected to be captured from the business operation;
702: the data acquisition unit is connected with the monitoring unit and used for acquiring user data of a target trigger event, wherein the user data comprises attribute information, behavior information and scene information, the attribute information is used for describing static characteristics of a user, the behavior information is used for indicating an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition;
703: the data storage unit is connected with the acquisition unit and used for storing the user data acquired by the data acquisition unit, the user data comprises first information and second information, and the first information comprises: user name, age, phone, scene information, etc., the second information including: click behavior information, browsing behavior information, scene information, and the like;
704: the data analysis unit is connected with the data storage unit and used for performing label classification model training by using the user data in the data storage unit, and the label classification model extracts the characteristics of the user and can be used for analyzing and controlling the user business risk;
705: and the data query unit is connected with the data storage unit and is used for realizing free query of the context data.
It can be seen that, in the embodiment of the present application, the service operation executed by the target terminal is monitored; if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information and behavior information, the attribute information is used for describing static characteristics of a user, the behavior information is used for describing an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition; and sending the user data to a database of a server. By adopting the method of the embodiment of the application, the client data acquired by the non-buried point data is reported to the database of the server in real time, the data can be processed with high throughput, and massive data storage and efficient aggregation and real-time analysis of the data are supported.
Specifically, in the embodiment of the present application, the data acquisition device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Referring to fig. 8, fig. 8 is a schematic structural diagram of another data acquisition apparatus provided in the embodiment of the present application, which is consistent with the embodiment shown in fig. 2, and is shown in fig. 8:
an electronic device, comprising:
one or more processors; one or more memories for storing programs, one or more communication interfaces for wireless communication, the memories and the communication interfaces being connected to each other and performing communication work therebetween; the one or more memories and the program are configured to control the apparatus to perform some or all of the steps as described in any of the methods of the first aspect of the embodiments of the application by the one or more processors.
The memory may be a volatile memory such as a dynamic random access memory DRAM, or a non-volatile memory such as a mechanical hard disk. The memory is used for storing a set of executable program codes, and the processor is used for calling the executable program codes stored in the memory, and can execute part or all of the steps of any data acquisition method described in the data acquisition method embodiment.
The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), and TDD-LTE (Time Division duplex-Long Term Evolution).
The embodiment of the application provides a computer-readable storage medium, a computer program for electronic data exchange is stored in the computer-readable storage medium, the computer program comprises an execution instruction for executing part or all of the steps of any one of the data acquisition methods described in the above data acquisition method embodiments, and the computer comprises an electronic terminal device.
Embodiments of the present application provide a computer program product, wherein the computer program product includes a computer program operable to cause a computer to perform some or all of the steps of any one of the data collection methods as described in the above method embodiments, and the computer program product may be a software installation package.
It should be noted that, for the sake of simplicity, any of the above-mentioned embodiments of the data acquisition method are described as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence, because some steps may be performed in other sequences or simultaneously according to the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
The above embodiments of the present application are described in detail, and the principles and embodiments of a method and an apparatus for data acquisition according to the present application are explained herein by applying specific embodiments, and the description of the above embodiments is only used to help understand the method and the core ideas of the method; meanwhile, for those skilled in the art, according to the idea of the data acquisition method and apparatus of the present application, the specific implementation and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, hardware products and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. The memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Those skilled in the art will appreciate that all or part of the steps in the various methods of any of the above-described method embodiments of data acquisition may be performed by associated hardware as instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
It will be appreciated that all products controlled or configured to perform the processing methods of the flowcharts described in the method embodiments of data acquisition of the present application, such as the apparatus and computer program products of the flowcharts described above, are within the scope of the related products described herein.
It is apparent that those skilled in the art can make various changes and modifications to the method and apparatus for data acquisition provided herein without departing from the spirit and scope of the present application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of data acquisition, comprising:
monitoring the business operation executed by a target terminal;
if a target trigger event in the business operation is monitored, acquiring user data of the target trigger event, wherein the user data comprises attribute information, behavior information and scene information, the attribute information is used for describing static characteristics of a user, the behavior information is used for indicating an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition;
and sending the user data to a database of a server.
2. The method of claim 1, wherein the collecting user data for the target trigger event comprises:
reading the user data from the target terminal in real time through a high-throughput distributed publish-subscribe message system kafka;
converting the data format and dimensionality of the extracted data according to a pre-designed rule for the user data, so that the originally heterogeneous data formats can be unified;
and importing the converted data into the database according to planned increment or all.
3. The method of claim 1, wherein before the listening for the service operation performed by the target terminal, the method further comprises:
setting a target trigger event, wherein the target trigger event comprises the following steps: clicking event, exposure event and page dwell time;
the click event means that data is recorded once when the user clicks the page button of the target terminal once;
the exposure event refers to data when the user enters a page or refreshes the page, and the data is not recorded when the user exits the page;
the page stay time refers to the stay time of the user on the page, and is calculated by recording the time of the user entering the page and the time of the user leaving the page.
4. The method of any of claims 1-3, wherein after collecting the user data for the target trigger event, the method further comprises:
encrypting the user data based on an encryption mechanism of an asymmetric encryption algorithm RSA + triple data encryption algorithm 3 DES; wherein the 3DES is used to encrypt long content and the RSA is used to encrypt keys used by the 3 DES.
5. The method of claim 4, wherein prior to sending the user data to the database of the server, further comprising:
and calculating the user data in a streaming mode, wherein the streaming calculation is an event triggering calculation mode and refers to the calculation continuously performed on the user data of the event acquired in real time.
6. The method of claim 1, wherein before the monitoring a target trigger event in the service operation and collecting user data of the target trigger event, the method further comprises:
and loading a defined Software Development Kit (SDK) code in the service code to realize the non-intrusive interception of the user behavior event, wherein the non-intrusive interception is to capture a key user behavior event at a target trigger event in real time by using the SDK code.
7. The method according to any one of claims 1-6, wherein the user data of the database is extracted for label classification model training, the method comprising:
performing data segmentation on the user data to obtain a first data set and a second data set, wherein the first data set is a standard data set labeled manually, the second data set is a training data set, the label is used for indicating the room purchasing intention of the user, and the label comprises a positive label and a negative label;
pre-training the first data set pair to obtain a first model, wherein the first model is a machine learning model with an improvement function;
taking the second data set as the input of the first model, carrying out model parameter adjustment and optimization, and determining a prediction label of a user corresponding to the user data in the second data set according to the output result of the first model;
and storing the training result of the optimized label classification model, wherein the model can be used for user risk analysis and risk control.
8. An apparatus for data acquisition, comprising:
the monitoring unit monitors an event of a target terminal at a data acquisition point, wherein the event is used for indicating a business operation executed by the target terminal, and the acquisition point is a target trigger event expected to be captured from the business operation;
the data acquisition unit is connected with the monitoring unit and used for acquiring user data of a target trigger event, wherein the user data comprises attribute information, behavior information and scene information, the attribute information is used for describing static characteristics of a user, the behavior information is used for indicating an operation record of the user, and the target trigger event is a preset event used for indicating data acquisition;
the data storage unit is connected with the acquisition unit and used for storing the user data acquired by the data acquisition unit, the user data comprises first information and second information, and the first information comprises: user name, age, phone, scene information, etc., the second information including: click behavior information, browsing behavior information, scene information, and the like;
the data analysis unit is connected with the data storage unit and used for performing label classification model training by using the user data in the data storage unit, and the label classification model extracts the characteristics of the user and can be used for analyzing and controlling the user business risk;
and the data query unit is connected with the data storage unit and is used for realizing free query of the context data.
9. An electronic device, comprising:
one or more processors;
one or more memories for storing programs,
one or more communication interfaces for wireless communication, wherein the memory and the communication interfaces are connected with each other and perform communication work with each other;
the one or more memories and the program are configured to control the apparatus to perform the steps in the method of any one of claims 1-7 by the one or more processors.
10. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-7.
CN202111195332.6A 2021-10-12 2021-10-12 Data acquisition method and related device Pending CN114003567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111195332.6A CN114003567A (en) 2021-10-12 2021-10-12 Data acquisition method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111195332.6A CN114003567A (en) 2021-10-12 2021-10-12 Data acquisition method and related device

Publications (1)

Publication Number Publication Date
CN114003567A true CN114003567A (en) 2022-02-01

Family

ID=79922846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111195332.6A Pending CN114003567A (en) 2021-10-12 2021-10-12 Data acquisition method and related device

Country Status (1)

Country Link
CN (1) CN114003567A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422179A (en) * 2022-09-14 2022-12-02 冯秦海 AI training processing method based on big data cleaning and artificial intelligence training system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422179A (en) * 2022-09-14 2022-12-02 冯秦海 AI training processing method based on big data cleaning and artificial intelligence training system

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
US20170109657A1 (en) Machine Learning-Based Model for Identifying Executions of a Business Process
US9171072B2 (en) System and method for real-time dynamic measurement of best-estimate quality levels while reviewing classified or enriched data
WO2017190610A1 (en) Target user orientation method and device, and computer storage medium
US20170109676A1 (en) Generation of Candidate Sequences Using Links Between Nonconsecutively Performed Steps of a Business Process
WO2019149145A1 (en) Compliant report class sorting method and apparatus
US20170109636A1 (en) Crowd-Based Model for Identifying Executions of a Business Process
US20220043879A1 (en) System and method for collection of a website in a past state and retroactive analysis thereof
US20170109639A1 (en) General Model for Linking Between Nonconsecutively Performed Steps in Business Processes
CN111340240A (en) Method and device for realizing automatic machine learning
CN108074033A (en) Processing method, system, electronic equipment and the storage medium of achievement data
US11836331B2 (en) Mathematical models of graphical user interfaces
CN110798567A (en) Short message classification display method and device, storage medium and electronic equipment
CN109711849B (en) Ether house address portrait generation method and device, electronic equipment and storage medium
CN113297287B (en) Automatic user policy deployment method and device and electronic equipment
CN114003567A (en) Data acquisition method and related device
US20200342471A1 (en) Computer system and method for electronic survey programming
US20170109637A1 (en) Crowd-Based Model for Identifying Nonconsecutive Executions of a Business Process
CN109146306B (en) Enterprise management system
CN110428342A (en) Data recovery method, server, customer side and storage medium
CN116304236A (en) User portrait generation method and device, electronic equipment and storage medium
US20170109670A1 (en) Crowd-Based Patterns for Identifying Executions of Business Processes
CN110163482B (en) Method for determining safety scheme data of activity scheme, terminal equipment and server
CN111127057B (en) Multi-dimensional user portrait recovery method
CN112200602A (en) Neural network model training method and device for advertisement recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination