WO2022088632A1 - User data monitoring and analysis method, apparatus, device, and medium - Google Patents

User data monitoring and analysis method, apparatus, device, and medium Download PDF

Info

Publication number
WO2022088632A1
WO2022088632A1 PCT/CN2021/090312 CN2021090312W WO2022088632A1 WO 2022088632 A1 WO2022088632 A1 WO 2022088632A1 CN 2021090312 W CN2021090312 W CN 2021090312W WO 2022088632 A1 WO2022088632 A1 WO 2022088632A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
behavior data
behavior
dimensionality reduction
data
Prior art date
Application number
PCT/CN2021/090312
Other languages
French (fr)
Chinese (zh)
Inventor
谢展成
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022088632A1 publication Critical patent/WO2022088632A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions

Definitions

  • the present application relates to the technical field of data monitoring, and in particular, to a method, device, electronic device, and computer-readable storage medium for monitoring and analyzing user data based on third-party software.
  • the third-party software refers to the first party and the second party.
  • the second party refers to the problem to be solved by itself, namely the user, and the use of other software to provide services for its own users refers to the third-party software.
  • a method for monitoring and analyzing user data based on third-party software provided by this application includes:
  • the application also provides a user data monitoring and analysis device based on third-party software, the device comprising:
  • a behavioral data acquisition module used to collect behavioral data sets of target users from third-party software
  • a data detection module is used to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and abnormal behavior data set;
  • a data reconstruction module configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
  • a visualization module configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  • the present application also provides an electronic device, the electronic device comprising:
  • a processor that executes the instructions stored in the memory to achieve the following steps:
  • the present application also provides a computer-readable storage medium, including a storage data area and a storage program area, the storage data area stores created data, and the storage program area stores a computer program; wherein, the computer program is implemented as follows when executed by a processor step:
  • FIG. 1 is a schematic flowchart of a third-party software-based user data monitoring and analysis method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of S2 in the third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
  • FIG. 3 is a schematic flowchart of S2 in the third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
  • FIG. 4 is a schematic block diagram of a third-party software-based user data monitoring and analysis device provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of the internal structure of an electronic device for implementing a third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
  • Embodiments of the present application provide a method for monitoring and analyzing user data based on third-party software.
  • the execution subject of the third-party software-based user data monitoring and analysis method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal.
  • the third-party software-based user data monitoring and analysis method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • the present application provides a method for monitoring and analyzing user data based on third-party software.
  • FIG. 1 it is a schematic flowchart of a method for monitoring and analyzing user data based on third-party software provided by an embodiment of the present application.
  • the method may be performed by an apparatus, which may be implemented in software and/or hardware.
  • the method for monitoring and analyzing user data based on third-party software includes:
  • the target user refers to a user of third-party software.
  • the third-party software refers to the first party and the second party, wherein the first party refers to the target user, the second party refers to the user or platform connected to the target user, and the third-party software refers to the target user. means the software used by the First Party.
  • the behavior data set of the target user may be acquired from third-party software through a preconfigured monitoring script.
  • the configuration information of the monitoring script can be deployed on an internal server, and can be quickly configured by means of hot update, without deploying a version and without performing a grayscale test.
  • the monitoring script can be directly verified by the data reported to the data statistics system by the production end, and if a mismatch is found, it can be quickly corrected by hot update, and the user level has no perception, no affect the experience.
  • the configuration information includes: user visual area information, user location information, etc., which is convenient for monitoring the behavior data set generated by the user when using the third-party software.
  • the behavior data set includes: the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, the buttons the user clicks when browsing the third-party software, etc.
  • the third-party software generally consists of multiple interfaces, and this application implements the In an example, the specific interface and duration of the user's operation of the third-party software are obtained, and the behavior data set is obtained.
  • the acquired behavior data set is first uploaded to the redis cache, and then the behavior data set is uploaded to the database, so as to prepare for subsequent analysis and processing of the behavior data set.
  • the behavior data set may also be stored in a blockchain node.
  • the amount of data in the acquired behavior data set is relatively large, which is not conducive to the calculation and analysis of the user behavior data by the computer, it is necessary to perform a dimensionality reduction operation on the behavior data set, To reduce the amount of data to facilitate computer calculations.
  • a dimensionality reduction operation is performed on the behavior data set to obtain a dimensionality reduction behavior data set, including:
  • the word2vec method may be used to encode the behavior data set into a user behavior vector set.
  • the weight set is a weight preset by the user according to each user behavior in the user behavior vector set.
  • X j represents the jth weight behavior vector in the weight behavior vector set
  • x j represents the j th user behavior vector in the user behavior vector set
  • k is the data amount of the weight behavior vector set
  • w j is the The jth weight in the weight set.
  • one of the application examples of this application encodes behavior data sets including the duration of the user browsing the third-party software, the interface related to the user's browsing of the third-party software, etc., respectively, and the duration of the user's browsing the third-party software is x 1 .
  • the software interface is the user behavior vector set of x 2 , and the corresponding calculation is performed through the weight set to obtain the weight behavior vector set including the weight behavior vector sets X 1 and X 2 .
  • S23. Perform dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set.
  • Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set
  • X i represents the ith weight behavior vector in the weight behavior vector set
  • W j represents the weight matrix obtained from the weight set.
  • the jth row vector, W j T represents the transpose of W j .
  • a PCA algorithm may also be used to perform dimensionality reduction processing on the weight behavior vector set to obtain a dimensionality reduction behavior data set.
  • the embodiment of the present application needs to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set.
  • a support vector data description (SVDD for short) can be used to construct a data anomaly detection model.
  • the SVDD is a data description method that can describe the target dataset hyperspherically and can be used for heterogeneous point detection or classification.
  • the S3 includes:
  • the hypersphere is constructed using the following formula:
  • ⁇ i represents the first Lagrangian multiplier of the hypersphere
  • o represents the center of the hypersphere
  • C represents the penalty factor
  • q i represents the dimensionality reduction behavior data set
  • ⁇ i represents the relaxation variable
  • the radius of the hypersphere is calculated using the following formula:
  • R represents the radius of the hypersphere
  • ⁇ j represents the second Lagrangian multiplier of the hypersphere
  • Q i , Q j represent any two dimensionality reduction behavior data in the dimensionality reduction behavior data set
  • K ( ) represents a Gaussian kernel function.
  • the following formula is used to calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere:
  • D represents the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere, and
  • the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere is compared with the radius of the hypersphere, and if the distance is smaller than the radius of the hypersphere, the data is considered to be is normal data, and the data whose distance is smaller than the radius is aggregated by using SQL technology to obtain the normal behavior data set.
  • the data is considered to be abnormal data
  • SQL technology is used to summarize the data whose distance is greater than or equal to the radius to obtain: The abnormal behavior dataset.
  • the S4 includes: calculating the distance between each normal data in the normal behavior data set and each abnormal data in the abnormal behavior data set to obtain a distance value set; Each distance value in the distance value set is compared with a preset threshold value, normal data and abnormal data corresponding to the distance value set not greater than the threshold value are selected, and the selected normal data and abnormal data are aggregated to obtain a standard data set.
  • the following formula is used to calculate the distance between the normal behavior data set and the abnormal behavior data set:
  • dist(x,y) represents the distance between the normal behavior data set and the abnormal behavior data set
  • xi represents the data points in the normal behavior data set
  • yi represents the data points in the abnormal behavior data set
  • n represents The data volume of the normal behavior data set or the abnormal behavior data set.
  • the preset threshold is 10, and if the distance between a normal data A and an abnormal data B is 5, both the normal data A and the abnormal data B can be divided into the standard data set.
  • the visualization refers to transforming the unclear and unorganized data into a clear and intuitive chart form through certain technical means, which is convenient for analysis and viewing of the data.
  • the jfreeChart icon drawing class library is invoked through java technology to process the standard data set, and a column chart set of clearly visible user behavior data is generated.
  • JFreeChart is an open chart drawing class library on the JAVA platform. It can draw data into pie charts, bar charts, scatter charts, time series charts, Gantt charts, line charts and other charts, and can generate PNG and JPEG formats. The output can also be associated with PDF and EXCEL.
  • This embodiment of the present application collects behavior data sets of target users from third-party software, and performs dimensionality reduction operations, data anomaly detection, and data reconstruction on the behavior data sets, thereby reducing the data dimensions of the behavior data sets, and converting the behavior data After the set is divided into a normal behavior data set and an abnormal behavior data set, data reconstruction is performed to obtain a standard data set.
  • the dimensionality reduction operation in the embodiment of the present application can effectively reduce the data dimension and avoid Waste of storage and computing resources, while improving data through data anomaly detection and data reconstruction, and improving the accuracy of data monitoring, so the third-party software-based user data monitoring and analysis method, device and computer-readable storage medium proposed in this application, It can solve the problem of consuming a lot of computer memory in the process of data monitoring.
  • FIG. 4 it is a schematic block diagram of the user data monitoring and analysis device based on the third-party software of the present application.
  • the apparatus 100 for monitoring and analyzing user data based on third-party software described in this application may be installed in an electronic device.
  • the third-party software-based user data monitoring and analysis device may include a behavior data acquisition module 101 , a data detection module 102 , a data reconstruction module 103 and a visualization module 104 .
  • the modules described in the present invention can also be called units, which refer to a series of computer program segments that can be executed by the electronic device processor and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the behavior data acquisition module 101 is used to collect the behavior data set of the target user from third-party software
  • the data detection module 102 is configured to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and anomalous behavior datasets;
  • the data reconstruction module 103 is configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
  • the visualization module 104 is configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal;
  • each module of the apparatus for extracting and generating text content in the image is as follows:
  • the behavior data acquisition module 101 is used to collect behavior data sets of target users from third-party software.
  • the target user refers to a user of third-party software.
  • the third-party software refers to the first party and the second party, wherein the first party refers to the target user, the second party refers to the user or platform connected to the target user, and the third-party software refers to the target user. means the software used by the First Party.
  • the behavior data set of the target user may be acquired from third-party software through a preconfigured monitoring script.
  • the configuration information of the monitoring script can be deployed on an internal server, and can be quickly configured by means of hot update, without deploying a version and without performing a grayscale test.
  • the monitoring script can be directly verified by the data reported to the data statistics system by the production end, and if a mismatch is found, it can be quickly corrected by hot update, and the user level has no perception, no affect the experience.
  • the configuration information includes: user visual area information, user location information, etc., which is convenient for monitoring the behavior data set generated by the user when using the third-party software.
  • the behavior data set includes: the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, the buttons clicked by the user when browsing the third-party software, and the like.
  • the acquired behavior data set is first uploaded to the redis cache, and then the behavior data set is uploaded to the database, so as to prepare for subsequent analysis and processing of the behavior data set.
  • the behavior data set may also be stored in a blockchain node.
  • the data detection module 102 is configured to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and Anomalous behavior dataset.
  • the amount of data in the acquired behavior data set is relatively large, which is not conducive to the calculation and analysis of the user behavior data by the computer, it is necessary to perform a dimensionality reduction operation on the behavior data set, To reduce the amount of data to facilitate computer calculations.
  • performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set includes: performing an encoding operation on the behavior data set to obtain a user behavior vector set; using The pre-built weight set is calculated to obtain the weight behavior vector set of the user behavior vector set; the dimension reduction process is performed on the weight behavior vector set to obtain the dimension reduction behavior data set.
  • the word2vec method may be used to encode the behavior data set into a user behavior vector set.
  • the weight set is a weight preset by the user according to each user behavior in the user behavior vector set.
  • X j represents the jth weight behavior vector in the weight behavior vector set
  • x j represents the j th user behavior vector in the user behavior vector set
  • k is the data amount of the weight behavior vector set
  • w j is the The jth weight in the weight set.
  • one of the application examples of this application encodes behavior data sets including the duration of the user browsing the third-party software, the interface related to the user's browsing of the third-party software, etc., respectively, and the duration of the user's browsing the third-party software is x 1 .
  • the software-related interface is the user behavior vector set of x 2 , and the corresponding calculation is performed through the weight set to obtain the weight behavior vector set including the weight behavior vector sets X 1 and X 2 .
  • Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set
  • X i represents the ith weight behavior vector in the weight behavior vector set
  • W j represents the weight matrix obtained from the weight set.
  • the jth row vector, W j T represents the transpose of W j .
  • a PCA algorithm may also be used to perform dimensionality reduction processing on the weight behavior vector set to obtain a dimensionality reduction behavior data set.
  • the embodiment of the present application needs to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set.
  • a support vector data description (SVDD for short) can be used to construct a data anomaly detection model.
  • the SVDD is a data description method that can describe the target dataset hyperspherically and can be used for heterogeneous point detection or classification.
  • using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set includes: constructing a hypersphere according to the dimensionality reduction behavior data set; Calculate the distance from the data in the dimensionality reduction behavior dataset to the center of the hypersphere; summarize the data whose distance is less than the radius to obtain the normal behavior dataset; summarize the distance greater than or data equal to the radius to obtain the abnormal behavior data set.
  • the hypersphere is constructed using the following formula:
  • ⁇ i represents the first Lagrangian multiplier of the hypersphere
  • o represents the center of the hypersphere
  • C represents the penalty factor
  • q i represents the dimensionality reduction behavior data set
  • ⁇ i represents the relaxation variable
  • the radius of the hypersphere is calculated using the following formula:
  • R represents the radius of the hypersphere
  • ⁇ j represents the second Lagrangian multiplier of the hypersphere
  • Q i , Q j represent any two dimensionality reduction behavior data in the dimensionality reduction behavior data set
  • K ( ) represents a Gaussian kernel function.
  • the following formula is used to calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere:
  • D represents the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere, and
  • the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere is compared with the radius of the hypersphere, and if the distance is smaller than the radius of the hypersphere, the data is considered to be is normal data, and the data whose distance is smaller than the radius is aggregated by using SQL technology to obtain the normal behavior data set.
  • the data is considered to be abnormal data
  • SQL technology is used to summarize the data whose distance is greater than or equal to the radius to obtain: The abnormal behavior dataset.
  • the data reconstruction module 103 is configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set.
  • using a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set includes: calculating the normal behavior data The distance between each normal data in the set and each abnormal data in the abnormal behavior data set, to obtain a distance value set; compare each distance value in the distance value set with a preset threshold, and select not greater than the threshold The normal data and abnormal data corresponding to the distance value set are collected, and the selected normal data and abnormal data are aggregated to obtain a standard data set.
  • the following formula is used to calculate the distance between the normal behavior data set and the abnormal behavior data set:
  • dist(x,y) represents the distance between the normal behavior data set and the abnormal behavior data set
  • xi represents the data points in the normal behavior data set
  • yi represents the data points in the abnormal behavior data set
  • n represents The data volume of the normal behavior data set or the abnormal behavior data set.
  • the preset threshold is 10, and if the distance between a normal data A and an abnormal data B is 5, both the normal data A and the abnormal data B can be divided into the standard data set.
  • the visualization module 104 is configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  • the visualization refers to transforming the unclear and unorganized data into a clear and intuitive chart form through certain technical means, which is convenient for analysis and viewing of the data.
  • the jfreeChart icon drawing class library is invoked through java technology to process the standard data set, and a column chart set of clearly visible user behavior data is generated.
  • JFreeChart is an open chart drawing class library on the JAVA platform. It can draw data into pie charts, bar charts, scatter charts, time series charts, Gantt charts, line charts and other charts, and can generate PNG and JPEG formats. The output can also be associated with PDF and EXCEL.
  • FIG. 5 it is a schematic structural diagram of an electronic device implementing a third-party software-based user data monitoring and analysis method in the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as user data monitoring and analysis based on third-party software Procedure 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 , such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software and various data installed in the electronic device 1, such as the code of the user data monitoring and analysis program 12 based on third-party software, etc., but also can be used to temporarily store the output that has been output or will be output. The data.
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central Processing Unit CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11.
  • User data monitoring and analysis programs based on third-party software, etc. and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA Extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 5 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 5 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the third-party software-based user data monitoring and analysis program 12 stored in the memory 11 of the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, can achieve:
  • the modules/units integrated by the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, and the computer-readable storage medium , which can be non-volatile or volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the computer usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; using the created data, etc.
  • modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A third-party software-based user data monitoring and analysis method, related to data monitoring techniques, comprising: collecting a behavioral data set of a target user from third-party software (S1); reducing the dimensionality of the behavioral data set to produce a dimensionality-reduced behavioral data set (S2); utilizing a data anomaly detection model to test the dimensionality-reduced behavioral data set to produce a normal behavioral data set and an abnormal behavioral data set (S3); utilizing a collaborative filtering algorithm to execute data reconstruction on the basis of the normal behavioral data set and of the abnormal behavioral data set to produce a standard data set (S4); and executing visualization processing with respect to the standard data set to produce a visualized chart set (S5). The method solves the problem of a large amount of computer memory being consumed during a data monitoring process.

Description

用户数据监控分析方法、装置、设备及介质User data monitoring and analysis method, device, equipment and medium
本申请要求于2020年11月2日提交中国专利局、申请号为CN202011204209.1、名称为“用户数据监控分析方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number CN202011204209.1 and the title of "User Data Monitoring and Analysis Method, Apparatus, Equipment and Medium" filed with the China Patent Office on November 2, 2020, the entire contents of which are by reference Incorporated in this application.
技术领域technical field
本申请涉及数据监控技术领域,尤其涉及一种基于第三方软件的用户数据监控分析方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of data monitoring, and in particular, to a method, device, electronic device, and computer-readable storage medium for monitoring and analyzing user data based on third-party software.
背景技术Background technique
目前,很多软件产品或者平台为了给用户提供更加丰富的产品与服务都会接入第三方软件,所述第三方软件是针对第一方、第二方而言,其中,第一方是指自已,第二方是指自己要解决的问题即用户,用另外的软件去为自己的用户提供服务,就是指第三方软件。At present, many software products or platforms will access third-party software in order to provide users with more abundant products and services. The third-party software refers to the first party and the second party. The second party refers to the problem to be solved by itself, namely the user, and the use of other software to provide services for its own users refers to the third-party software.
为了了解第三方软件对用户产生的效果,通常需要对所述第三方软件产生的用户行为数据进行分析。发明人意识到传统的分析方法多利用支持向量机(SVM)建模来完成,但是SVM的空间消耗主要是存储训练样本和核矩阵,由于SVM是借助二次规划来求解支持向量,而求解二次规划将涉及m阶矩阵的计算(m为样本的个数),当数目很大时该矩阵的存储和计算将耗费大量的计算机内存,损耗计算机磁盘的读写速度。In order to understand the effect of the third-party software on the user, it is usually necessary to analyze the user behavior data generated by the third-party software. The inventor realizes that traditional analysis methods are mostly completed by using support vector machine (SVM) modeling, but the space consumption of SVM is mainly to store training samples and kernel matrices, because SVM uses quadratic programming to solve support vectors, while solving two The sub-planning will involve the calculation of the m-order matrix (m is the number of samples). When the number is large, the storage and calculation of the matrix will consume a lot of computer memory and the read and write speed of the computer disk.
发明内容SUMMARY OF THE INVENTION
本申请提供的一种基于第三方软件的用户数据监控分析方法,包括:A method for monitoring and analyzing user data based on third-party software provided by this application includes:
从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
本申请还提供一种基于第三方软件的用户数据监控分析装置,所述装置包括:The application also provides a user data monitoring and analysis device based on third-party software, the device comprising:
行为数据获取模块,用于从第三方软件中收集目标用户的行为数据集;A behavioral data acquisition module, used to collect behavioral data sets of target users from third-party software;
数据检测模块,用于对所述行为数据集执行降维操作,得到降维行为数据集,利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;A data detection module is used to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and abnormal behavior data set;
数据重构模块,用于利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;a data reconstruction module, configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
可视化模块,用于对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。A visualization module, configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
本申请还提供一种电子设备,所述电子设备包括:The present application also provides an electronic device, the electronic device comprising:
存储器,存储至少一个指令;及a memory that stores at least one instruction; and
处理器,执行所述存储器中存储的指令以实现如下步骤:A processor that executes the instructions stored in the memory to achieve the following steps:
从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
本申请还提供一种计算机可读存储介质,包括存储数据区和存储程序区,存储数据区存储创建的数据,存储程序区存储有计算机程序;其中,所述计算机程序被处理器执行时实现如下步骤:The present application also provides a computer-readable storage medium, including a storage data area and a storage program area, the storage data area stores created data, and the storage program area stores a computer program; wherein, the computer program is implemented as follows when executed by a processor step:
从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
附图说明Description of drawings
图1为本申请一实施例提供的基于第三方软件的用户数据监控分析方法的流程示意图;1 is a schematic flowchart of a third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
图2为本申请一实施例提供的基于第三方软件的用户数据监控分析方法中S2的流程示意图;2 is a schematic flowchart of S2 in the third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
图3为本申请一实施例提供的基于第三方软件的用户数据监控分析方法中S2的流程示意图;3 is a schematic flowchart of S2 in the third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
图4为本申请一实施例提供的基于第三方软件的用户数据监控分析装置的模块示意图;4 is a schematic block diagram of a third-party software-based user data monitoring and analysis device provided by an embodiment of the present application;
图5为本申请一实施例提供的实现基于第三方软件的用户数据监控分析方法的电子设备的内部结构示意图;5 is a schematic diagram of the internal structure of an electronic device for implementing a third-party software-based user data monitoring and analysis method provided by an embodiment of the present application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
本申请实施例提供一种基于第三方软件的用户数据监控分析方法。所述基于第三方软件的用户数据监控分析方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,所述基于第三方软件的用户数据监控分析方法可以由安装在终端设备或服务端设备的软件或硬件来执行,所述软件可以是区块链平台。所述服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。Embodiments of the present application provide a method for monitoring and analyzing user data based on third-party software. The execution subject of the third-party software-based user data monitoring and analysis method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal. In other words, the third-party software-based user data monitoring and analysis method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
本申请提供一种基于第三方软件的用户数据监控分析方法。参照图1所示,为本申请一实施例提供的基于第三方软件的用户数据监控分析方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。The present application provides a method for monitoring and analyzing user data based on third-party software. Referring to FIG. 1 , it is a schematic flowchart of a method for monitoring and analyzing user data based on third-party software provided by an embodiment of the present application. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
在本实施例中,基于第三方软件的用户数据监控分析方法包括:In this embodiment, the method for monitoring and analyzing user data based on third-party software includes:
S1、从第三方软件中收集目标用户的行为数据集。S1. Collect behavior data sets of target users from third-party software.
本申请较佳实施例中,所述目标用户是指第三方软件的用户。所述第三方软件是针对第一方、第二方而言,其中,第一方是指所述目标用户,第二方是指与所述目标用户对接的用户或平台等,而第三方软件是指第一方所使用的软件。In a preferred embodiment of the present application, the target user refers to a user of third-party software. The third-party software refers to the first party and the second party, wherein the first party refers to the target user, the second party refers to the user or platform connected to the target user, and the third-party software refers to the target user. means the software used by the First Party.
本申请实施例中,可通过一个预先配置的监控脚本从第三方软件中获取所述目标用户的行为数据集。其中,所述监控脚本的配置信息可以部署在一个内部服务器上,并通过热更新的方式进行快速配置,无需部署版本,也无需进行灰度测试。In this embodiment of the present application, the behavior data set of the target user may be acquired from third-party software through a preconfigured monitoring script. Wherein, the configuration information of the monitoring script can be deployed on an internal server, and can be quickly configured by means of hot update, without deploying a version and without performing a grayscale test.
本申请实施例直接可以通过生产端上报到数据统计系统里的数据对所述监控脚本进行验证,如果发现误配,也能快速通过热更新的方式修正,对于用户层面更是没有感知,不会影响体验。其中,所述配置信息中包含:用户视觉区域信息、用户位置信息等,方便监控用户在使用第三方软件时所产生的行为数据集。其中,所述行为数据集包含:用户浏览第三方软件的时长、用户浏览第三方软件相关的界面、用户浏览第三方软件时点击的按钮等,第三方软件一般由多个界面组成,本申请实施例中,获取用户操作第三方软件中具体的界面及时长等,得到所述行为数据集。In the embodiment of the present application, the monitoring script can be directly verified by the data reported to the data statistics system by the production end, and if a mismatch is found, it can be quickly corrected by hot update, and the user level has no perception, no affect the experience. Wherein, the configuration information includes: user visual area information, user location information, etc., which is convenient for monitoring the behavior data set generated by the user when using the third-party software. The behavior data set includes: the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, the buttons the user clicks when browsing the third-party software, etc. The third-party software generally consists of multiple interfaces, and this application implements the In an example, the specific interface and duration of the user's operation of the third-party software are obtained, and the behavior data set is obtained.
进一步地,本申请实施例将获取到的所述行为数据集先上传到redis缓存中,再把所述行为数据集上传到数据库中,为后续分析处理所述行为数据集做准备。Further, in this embodiment of the present application, the acquired behavior data set is first uploaded to the redis cache, and then the behavior data set is uploaded to the database, so as to prepare for subsequent analysis and processing of the behavior data set.
本申请另一实施例中,所述行为数据集也可以存储在区块链节点中。In another embodiment of the present application, the behavior data set may also be stored in a blockchain node.
S2、对所述行为数据集执行降维操作,得到降维行为数据集。S2. Perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set.
本申请较佳实施例中,由于获取到的所述行为数据集中的数据量比较大,不利于计算机对所述用户行为数据的计算与分析,因此需要对所述行为数据集执行降维操作,以缩小数据量方便计算机进行计算。In the preferred embodiment of the present application, since the amount of data in the acquired behavior data set is relatively large, which is not conducive to the calculation and analysis of the user behavior data by the computer, it is necessary to perform a dimensionality reduction operation on the behavior data set, To reduce the amount of data to facilitate computer calculations.
详细地,参阅图2所示,本申请较佳实施例中所述对所述行为数据集执行降维操作,得到降维行为数据集,包括:In detail, referring to FIG. 2 , in the preferred embodiment of the present application, a dimensionality reduction operation is performed on the behavior data set to obtain a dimensionality reduction behavior data set, including:
S21、对所述行为数据集执行编码操作,得到用户行为向量集;S21, performing an encoding operation on the behavior data set to obtain a user behavior vector set;
本申请实施例可采用word2vec方法将行为数据集编码为用户行为向量集。In this embodiment of the present application, the word2vec method may be used to encode the behavior data set into a user behavior vector set.
S22、利用预构建的权重集,计算得到所述用户行为向量集的权重行为向量集;S22, using a pre-built weight set to calculate the weight behavior vector set of the user behavior vector set;
本申请较佳实施例中,所述权重集是用户根据所述用户行为向量集内每个用户行为而预先设定的权重。In a preferred embodiment of the present application, the weight set is a weight preset by the user according to each user behavior in the user behavior vector set.
本申请一个可选实施例可以利用下述公式计算所述行为数据集的权重行为向量集:An optional embodiment of the present application may use the following formula to calculate the weighted behavior vector set of the behavior data set:
Figure PCTCN2021090312-appb-000001
Figure PCTCN2021090312-appb-000001
其中,X j表示所述权重行为向量集中第j个权重行为向量,x j表示所述用户行为向量集中的第j个用户行为向量,k为所述权重行为向量集的数据量,w j为所述权重集中的第j个权重。 Wherein, X j represents the jth weight behavior vector in the weight behavior vector set, x j represents the j th user behavior vector in the user behavior vector set, k is the data amount of the weight behavior vector set, w j is the The jth weight in the weight set.
例如,本申请其中一个应用实例将包括用户浏览第三方软件的时长、用户浏览第三方软件相关的界面等行为数据集进行编码,分别得到用户浏览第三方软件的时长为x 1,用户浏览第三方软件界面为x 2的用户行为向量集,并通过所述权重集进行对应计算,得到包括所述权重行为向量集X 1和X 2的权重行为向量集。S23、对所述权重行为向量集执行降维处理,得到所述降维行为数据集。 For example, one of the application examples of this application encodes behavior data sets including the duration of the user browsing the third-party software, the interface related to the user's browsing of the third-party software, etc., respectively, and the duration of the user's browsing the third-party software is x 1 . The software interface is the user behavior vector set of x 2 , and the corresponding calculation is performed through the weight set to obtain the weight behavior vector set including the weight behavior vector sets X 1 and X 2 . S23. Perform dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set.
本申请一个可选的实施例可以利用如下公式对所述权重行为向量集执行降维处理:An optional embodiment of the present application may use the following formula to perform dimensionality reduction processing on the weight behavior vector set:
Q i=(X i-X iW jW j T)(X i-X iW jW j T) T Q i =(X i -X i W j W j T )(X i -X i W j W j T ) T
其中,Q i表示所述降维行为数据集中第i个降维行为数据,X i表示所述权重行为向量集的第i个权重行为向量,W j表示由所述权重集得到的权重矩阵中第j行向量,W j T表示W j的转置。 Wherein, Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set, X i represents the ith weight behavior vector in the weight behavior vector set, and W j represents the weight matrix obtained from the weight set. The jth row vector, W j T represents the transpose of W j .
本申请另一个可选的实施例也可利用PCA算法对所述权重行为向量集执行降维处理,得到降维行为数据集。In another optional embodiment of the present application, a PCA algorithm may also be used to perform dimensionality reduction processing on the weight behavior vector set to obtain a dimensionality reduction behavior data set.
S3、利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集。S3, using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set.
本申请较佳实施例中,所述降维行为数据集中会有许多有关用户行为的异常数据,如用户在浏览第三方软件的时间一般在几分钟到几小时认为是正常数据,若出现十几个小时 甚至二十几个小时的浏览时间,则认为是异常数据。因此,本申请实施例需要检测所述降维行为数据集,得到正常行为数据集和异常行为数据集。In a preferred embodiment of the present application, there are many abnormal data related to user behavior in the dimensionality reduction behavior data set. For example, the time when a user browses third-party software is generally considered to be normal data for several minutes to several hours. Hours or even twenty hours of browsing time are considered abnormal data. Therefore, the embodiment of the present application needs to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set.
本申请较佳实施例可以采用支持向量数据描述方法(support vector data description,简称SVDD)来构建数据异常检测模型。所述SVDD是一种数据描述方法,能够对目标数据集进行超球形描述,并可用于异类点检测或分类。详细地,参阅图3所示,所述S3包括:In a preferred embodiment of the present application, a support vector data description (SVDD for short) can be used to construct a data anomaly detection model. The SVDD is a data description method that can describe the target dataset hyperspherically and can be used for heterogeneous point detection or classification. In detail, referring to FIG. 3 , the S3 includes:
S31、根据所述降维行为数据集构造超球体;S31. Construct a hypersphere according to the dimensionality reduction behavior data set;
一个可选实施例中,利用下述公式构造所述超球体:In an optional embodiment, the hypersphere is constructed using the following formula:
Figure PCTCN2021090312-appb-000002
Figure PCTCN2021090312-appb-000002
Figure PCTCN2021090312-appb-000003
Figure PCTCN2021090312-appb-000003
C-α ii=0 C-α ii =0
其中,α i表示所述超球体的第一拉格朗日乘子,o表示所述超球体的球心,C表示惩罚因子,q i表示所述降维行为数据集,γ i表示松弛变量。 where α i represents the first Lagrangian multiplier of the hypersphere, o represents the center of the hypersphere, C represents the penalty factor, q i represents the dimensionality reduction behavior data set, and γ i represents the relaxation variable .
S32、计算所述超球体的半径;S32, calculating the radius of the hypersphere;
一个可选实施例中,利用下述公式计算所述超球体的半径:In an optional embodiment, the radius of the hypersphere is calculated using the following formula:
Figure PCTCN2021090312-appb-000004
Figure PCTCN2021090312-appb-000004
其中,R表示所述超球体的半径,α j表示所述超球体的第二拉格朗日乘子,Q i,Q j表示所述降维行为数据集中任意两个降维行为数据,K()表示高斯核函数。 Wherein, R represents the radius of the hypersphere, α j represents the second Lagrangian multiplier of the hypersphere, Q i , Q j represent any two dimensionality reduction behavior data in the dimensionality reduction behavior data set, K ( ) represents a Gaussian kernel function.
S33、计算所述降维行为数据集中的数据到所述超球体球心的距离;S33, calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere;
一个可选实施例中,利用下述公式计算所述降维行为数据集中的数据到所述超球体球心的距离:In an optional embodiment, the following formula is used to calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere:
Figure PCTCN2021090312-appb-000005
Figure PCTCN2021090312-appb-000005
其中,D表示所述降维行为数据集中的数据到所述超球体球心的距离,||表示范数计算。Wherein, D represents the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere, and || represents the norm calculation.
S34、汇总所述距离小于所述半径的数据,得到所述正常行为数据集;S34, summarizing the data whose distance is less than the radius to obtain the normal behavior data set;
本申请实施例通过将所述降维行为数据集中的数据到所述超球体球心的距离与所述超球体的半径进行比较,如果所述距离小于所述超球体半径,则认为所述数据是正常数据,并利用SQL技术汇总所述距离小于所述半径的数据,得到所述正常行为数据集。In this embodiment of the present application, the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere is compared with the radius of the hypersphere, and if the distance is smaller than the radius of the hypersphere, the data is considered to be is normal data, and the data whose distance is smaller than the radius is aggregated by using SQL technology to obtain the normal behavior data set.
S35、汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。S35. Summarize the data whose distance is greater than or equal to the radius to obtain the abnormal behavior data set.
进一步的,本申请实施例中,如果所述距离大于或等于所述超球体的半径,则认为所述数据为异常数据,并利用SQL技术汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。Further, in the embodiment of the present application, if the distance is greater than or equal to the radius of the hypersphere, the data is considered to be abnormal data, and SQL technology is used to summarize the data whose distance is greater than or equal to the radius to obtain: The abnormal behavior dataset.
S4、利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集。S4. Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set.
本申请较佳实施例中,所述S4包括:计算所述正常行为数据集内每个正常数据和所述异常行为数据集内每个异常数据的距离,得到距离值集;S42、将所述距离值集中每个距离值与预设的阈值进行比较,选取不大于所述阈值的距离值集对应的正常数据及异常数据,并汇总所选取的正常数据及异常数据得到标准数据集。In a preferred embodiment of the present application, the S4 includes: calculating the distance between each normal data in the normal behavior data set and each abnormal data in the abnormal behavior data set to obtain a distance value set; Each distance value in the distance value set is compared with a preset threshold value, normal data and abnormal data corresponding to the distance value set not greater than the threshold value are selected, and the selected normal data and abnormal data are aggregated to obtain a standard data set.
一个可选实施例中,利用下述公式计算所述正常行为数据集和所述异常行为数据集距 离:In an optional embodiment, the following formula is used to calculate the distance between the normal behavior data set and the abnormal behavior data set:
Figure PCTCN2021090312-appb-000006
Figure PCTCN2021090312-appb-000006
其中,dist(x,y)表示所述正常行为数据集和所述异常行为数据集距离,xi表示所述正常行为数据集中的数据点,yi表示所述异常行为数据集中的数据点,n表示所述正常行为数据集或所述异常行为数据集的数据量。Wherein, dist(x,y) represents the distance between the normal behavior data set and the abnormal behavior data set, xi represents the data points in the normal behavior data set, yi represents the data points in the abnormal behavior data set, and n represents The data volume of the normal behavior data set or the abnormal behavior data set.
例如:预设的阈值为10,若一个正常数据A与一个异常数据B之间的距离值为5,则正常数据A和异常数据B均可划分到标准数据集中。For example, the preset threshold is 10, and if the distance between a normal data A and an abnormal data B is 5, both the normal data A and the abnormal data B can be divided into the standard data set.
S5、可视化所述标准数据集,得到可视化图表集,并将所述可视化图表集返回至预设终端。S5. Visualize the standard data set to obtain a visual chart set, and return the visual chart set to a preset terminal.
所述可视化指的是通过一定的技术手段把不清晰无章法的数据转化为清晰直观的图表形式,方便分析查看数据,如用户浏览第三方软件的时间在所述标准数据集中都是以数字的形式存在,无法直观分析用户的浏览时间变化,若把这些数据转化为折线图,则可直观的看到用户浏览时间的变化;用户浏览第三方软件时点击软件按钮的次数也是以数字的形式在标准数据集中存在,若把这些数据转为柱形图,则可直观的看到软件中按钮的点击量,也可直观的看到哪些按钮最受用户喜欢点击,哪些按钮不受用户喜欢点击。The visualization refers to transforming the unclear and unorganized data into a clear and intuitive chart form through certain technical means, which is convenient for analysis and viewing of the data. There are forms, and it is impossible to intuitively analyze the changes of users' browsing time. If these data are converted into line graphs, you can intuitively see the changes of users' browsing time; the number of times users click software buttons when browsing third-party software is also in the form of numbers. There are standard data sets. If you convert these data into column charts, you can intuitively see the number of button clicks in the software, and you can also intuitively see which buttons are most popular among users and which buttons are not.
进一步的,本申请较佳实施例中,通过java技术调用jfreeChart图标绘制类库处理所述标准数据集,生成清晰可见的用户行为数据柱形图集。Further, in a preferred embodiment of the present application, the jfreeChart icon drawing class library is invoked through java technology to process the standard data set, and a column chart set of clearly visible user behavior data is generated.
JFreeChart是JAVA平台上的一个开放的图表绘制类库,可将数据绘制成饼图、柱状图、散点图、时序图、甘特图、折线图等多种图表,并且可以产生PNG和JPEG格式的输出,还可以与PDF和EXCEL关联。JFreeChart is an open chart drawing class library on the JAVA platform. It can draw data into pie charts, bar charts, scatter charts, time series charts, Gantt charts, line charts and other charts, and can generate PNG and JPEG formats. The output can also be associated with PDF and EXCEL.
本申请实施例从第三方软件中收集目标用户的行为数据集,对所述行为数据集分别执行降维操作、数据异常检测及数据重构,从而降低行为数据集的数据维度,并将行为数据集分成正常行为数据集和异常行为数据集后,进行数据重构得到标准数据集,相比于传统的分析方法如支持向量机来说,本申请实施例降维操作可有效降低数据维度,避免存储和计算资源的浪费,同时通过数据异常检测及数据重构完善数据,提高数据监控的准确性,因此本申请提出的基于第三方软件的用户数据监控分析方法、装置及计算机可读存储介质,可以解决数据监控过程中,耗费大量计算机内存的问题。This embodiment of the present application collects behavior data sets of target users from third-party software, and performs dimensionality reduction operations, data anomaly detection, and data reconstruction on the behavior data sets, thereby reducing the data dimensions of the behavior data sets, and converting the behavior data After the set is divided into a normal behavior data set and an abnormal behavior data set, data reconstruction is performed to obtain a standard data set. Compared with traditional analysis methods such as support vector machines, the dimensionality reduction operation in the embodiment of the present application can effectively reduce the data dimension and avoid Waste of storage and computing resources, while improving data through data anomaly detection and data reconstruction, and improving the accuracy of data monitoring, so the third-party software-based user data monitoring and analysis method, device and computer-readable storage medium proposed in this application, It can solve the problem of consuming a lot of computer memory in the process of data monitoring.
如图4所示,是本申请基于第三方软件的用户数据监控分析装置的模块示意图。As shown in FIG. 4 , it is a schematic block diagram of the user data monitoring and analysis device based on the third-party software of the present application.
本申请所述基于第三方软件的用户数据监控分析装置100可以安装于电子设备中。根据实现的功能,所述基于第三方软件的用户数据监控分析装置可以包括行为数据获取模块101、数据检测模块102、数据重构模块103及可视化模块104。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The apparatus 100 for monitoring and analyzing user data based on third-party software described in this application may be installed in an electronic device. According to the realized functions, the third-party software-based user data monitoring and analysis device may include a behavior data acquisition module 101 , a data detection module 102 , a data reconstruction module 103 and a visualization module 104 . The modules described in the present invention can also be called units, which refer to a series of computer program segments that can be executed by the electronic device processor and can perform fixed functions, and are stored in the memory of the electronic device.
在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:
所述行为数据获取模块101,用于从第三方软件中收集目标用户的行为数据集;The behavior data acquisition module 101 is used to collect the behavior data set of the target user from third-party software;
所述数据检测模块102,用于对所述行为数据集执行降维操作,得到降维行为数据集,利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;The data detection module 102 is configured to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and anomalous behavior datasets;
所述数据重构模块103,用于利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;The data reconstruction module 103 is configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
所述可视化模块104,用于对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端;The visualization module 104 is configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal;
详细地,所述图像中文本内容提取生成装置各模块的具体实施方式如下:In detail, the specific implementations of each module of the apparatus for extracting and generating text content in the image are as follows:
所述行为数据获取模块101用于从第三方软件中收集目标用户的行为数据集。The behavior data acquisition module 101 is used to collect behavior data sets of target users from third-party software.
本申请较佳实施例中,所述目标用户是指第三方软件的用户。所述第三方软件是针对第一方、第二方而言,其中,第一方是指所述目标用户,第二方是指与所述目标用户对接的用户或平台等,而第三方软件是指第一方所使用的软件。In a preferred embodiment of the present application, the target user refers to a user of third-party software. The third-party software refers to the first party and the second party, wherein the first party refers to the target user, the second party refers to the user or platform connected to the target user, and the third-party software refers to the target user. means the software used by the First Party.
本申请实施例中,可通过一个预先配置的监控脚本从第三方软件中获取所述目标用户的行为数据集。其中,所述监控脚本的配置信息可以部署在一个内部服务器上,并通过热更新的方式进行快速配置,无需部署版本,也无需进行灰度测试。In this embodiment of the present application, the behavior data set of the target user may be acquired from third-party software through a preconfigured monitoring script. Wherein, the configuration information of the monitoring script can be deployed on an internal server, and can be quickly configured by means of hot update, without deploying a version and without performing a grayscale test.
本申请实施例直接可以通过生产端上报到数据统计系统里的数据对所述监控脚本进行验证,如果发现误配,也能快速通过热更新的方式修正,对于用户层面更是没有感知,不会影响体验。其中,所述配置信息中包含:用户视觉区域信息、用户位置信息等,方便监控用户在使用第三方软件时所产生的行为数据集。其中,所述行为数据集包含:用户浏览第三方软件的时长、用户浏览第三方软件相关的界面、用户浏览第三方软件时点击的按钮等。In the embodiment of the present application, the monitoring script can be directly verified by the data reported to the data statistics system by the production end, and if a mismatch is found, it can be quickly corrected by hot update, and the user level has no perception, no affect the experience. Wherein, the configuration information includes: user visual area information, user location information, etc., which is convenient for monitoring the behavior data set generated by the user when using the third-party software. The behavior data set includes: the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, the buttons clicked by the user when browsing the third-party software, and the like.
进一步地,本申请实施例将获取到的所述行为数据集先上传到redis缓存中,再把所述行为数据集上传到数据库中,为后续分析处理所述行为数据集做准备。Further, in this embodiment of the present application, the acquired behavior data set is first uploaded to the redis cache, and then the behavior data set is uploaded to the database, so as to prepare for subsequent analysis and processing of the behavior data set.
本申请另一实施例中,所述行为数据集也可以存储在区块链节点中。In another embodiment of the present application, the behavior data set may also be stored in a blockchain node.
所述数据检测模块102用于对所述行为数据集执行降维操作,得到降维行为数据集,利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集。The data detection module 102 is configured to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and Anomalous behavior dataset.
本申请较佳实施例中,由于获取到的所述行为数据集中的数据量比较大,不利于计算机对所述用户行为数据的计算与分析,因此需要对所述行为数据集执行降维操作,以缩小数据量方便计算机进行计算。In the preferred embodiment of the present application, since the amount of data in the acquired behavior data set is relatively large, which is not conducive to the calculation and analysis of the user behavior data by the computer, it is necessary to perform a dimensionality reduction operation on the behavior data set, To reduce the amount of data to facilitate computer calculations.
详细地,本申请较佳实施例中,所述对所述行为数据集执行降维操作,得到降维行为数据集,包括:对所述行为数据集执行编码操作,得到用户行为向量集;利用预构建的权重集,计算得到所述用户行为向量集的权重行为向量集;对所述权重行为向量集执行降维处理,得到所述降维行为数据集。Specifically, in a preferred embodiment of the present application, performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set includes: performing an encoding operation on the behavior data set to obtain a user behavior vector set; using The pre-built weight set is calculated to obtain the weight behavior vector set of the user behavior vector set; the dimension reduction process is performed on the weight behavior vector set to obtain the dimension reduction behavior data set.
本申请实施例可采用word2vec方法将行为数据集编码为用户行为向量集。In this embodiment of the present application, the word2vec method may be used to encode the behavior data set into a user behavior vector set.
本申请较佳实施例中,所述权重集是用户根据所述用户行为向量集内每个用户行为而预先设定的权重。In a preferred embodiment of the present application, the weight set is a weight preset by the user according to each user behavior in the user behavior vector set.
本申请一个可选实施例可以利用下述公式计算所述行为数据集的权重行为向量集:An optional embodiment of the present application may use the following formula to calculate the weighted behavior vector set of the behavior data set:
Figure PCTCN2021090312-appb-000007
Figure PCTCN2021090312-appb-000007
其中,X j表示所述权重行为向量集中第j个权重行为向量,x j表示所述用户行为向量集中的第j个用户行为向量,k为所述权重行为向量集的数据量,w j为所述权重集中的第j个权重。 Wherein, X j represents the jth weight behavior vector in the weight behavior vector set, x j represents the j th user behavior vector in the user behavior vector set, k is the data amount of the weight behavior vector set, w j is the The jth weight in the weight set.
例如,本申请其中一个应用实例将包括用户浏览第三方软件的时长、用户浏览第三方软件相关的界面等行为数据集进行编码,分别得到用户浏览第三方软件的时长为x 1,用户浏览第三方软件相关的界面为x 2的用户行为向量集,并通过所述权重集进行对应计算,得到包括所述权重行为向量集X 1和X 2的权重行为向量集。 For example, one of the application examples of this application encodes behavior data sets including the duration of the user browsing the third-party software, the interface related to the user's browsing of the third-party software, etc., respectively, and the duration of the user's browsing the third-party software is x 1 . The software-related interface is the user behavior vector set of x 2 , and the corresponding calculation is performed through the weight set to obtain the weight behavior vector set including the weight behavior vector sets X 1 and X 2 .
本申请一个可选的实施例可以利用如下公式对所述权重行为向量集执行降维处理:An optional embodiment of the present application may use the following formula to perform dimensionality reduction processing on the weight behavior vector set:
Q i=(X i-X iW jW j T)(X i-X iW jW j T) T Q i =(X i -X i W j W j T )(X i -X i W j W j T ) T
其中,Q i表示所述降维行为数据集中第i个降维行为数据,X i表示所述权重行为向量集 的第i个权重行为向量,W j表示由所述权重集得到的权重矩阵中第j行向量,W j T表示W j的转置。 Wherein, Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set, X i represents the ith weight behavior vector in the weight behavior vector set, and W j represents the weight matrix obtained from the weight set. The jth row vector, W j T represents the transpose of W j .
本申请另一个可选的实施例也可利用PCA算法对所述权重行为向量集执行降维处理,得到降维行为数据集。In another optional embodiment of the present application, a PCA algorithm may also be used to perform dimensionality reduction processing on the weight behavior vector set to obtain a dimensionality reduction behavior data set.
本申请较佳实施例中,所述降维行为数据集中会有许多有关用户行为的异常数据,如用户在浏览第三方软件的时间一般在几分钟到几小时认为是正常数据,若出现十几个小时甚至二十几个小时的浏览时间,则认为是异常数据。因此,本申请实施例需要检测所述降维行为数据集,得到正常行为数据集和异常行为数据集。In a preferred embodiment of the present application, there are many abnormal data related to user behavior in the dimensionality reduction behavior data set. For example, the time when a user browses third-party software is generally considered to be normal data for several minutes to several hours. Hours or even twenty hours of browsing time are considered abnormal data. Therefore, the embodiment of the present application needs to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set.
本申请较佳实施例可以采用支持向量数据描述方法(support vector data description,简称SVDD)来构建数据异常检测模型。所述SVDD是一种数据描述方法,能够对目标数据集进行超球形描述,并可用于异类点检测或分类。详细地,所述利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集,包括:根据所述降维行为数据集构造超球体;计算所述超球体的半径;计算所述降维行为数据集中的数据到所述超球体球心的距离;汇总所述距离小于所述半径的数据,得到所述正常行为数据集;汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。In a preferred embodiment of the present application, a support vector data description (SVDD for short) can be used to construct a data anomaly detection model. The SVDD is a data description method that can describe the target dataset hyperspherically and can be used for heterogeneous point detection or classification. In detail, using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set includes: constructing a hypersphere according to the dimensionality reduction behavior data set; Calculate the distance from the data in the dimensionality reduction behavior dataset to the center of the hypersphere; summarize the data whose distance is less than the radius to obtain the normal behavior dataset; summarize the distance greater than or data equal to the radius to obtain the abnormal behavior data set.
一个可选实施例中,利用下述公式构造所述超球体:In an optional embodiment, the hypersphere is constructed using the following formula:
Figure PCTCN2021090312-appb-000008
Figure PCTCN2021090312-appb-000008
Figure PCTCN2021090312-appb-000009
Figure PCTCN2021090312-appb-000009
C-α ii=0 C-α ii =0
其中,α i表示所述超球体的第一拉格朗日乘子,o表示所述超球体的球心,C表示惩罚因子,q i表示所述降维行为数据集,γ i表示松弛变量。 where α i represents the first Lagrangian multiplier of the hypersphere, o represents the center of the hypersphere, C represents the penalty factor, q i represents the dimensionality reduction behavior data set, and γ i represents the relaxation variable .
一个可选实施例中,利用下述公式计算所述超球体的半径:In an optional embodiment, the radius of the hypersphere is calculated using the following formula:
Figure PCTCN2021090312-appb-000010
Figure PCTCN2021090312-appb-000010
其中,R表示所述超球体的半径,α j表示所述超球体的第二拉格朗日乘子,Q i,Q j表示所述降维行为数据集中任意两个降维行为数据,K()表示高斯核函数。 Wherein, R represents the radius of the hypersphere, α j represents the second Lagrangian multiplier of the hypersphere, Q i , Q j represent any two dimensionality reduction behavior data in the dimensionality reduction behavior data set, K ( ) represents a Gaussian kernel function.
一个可选实施例中,利用下述公式计算所述降维行为数据集中的数据到所述超球体球心的距离:In an optional embodiment, the following formula is used to calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere:
Figure PCTCN2021090312-appb-000011
Figure PCTCN2021090312-appb-000011
其中,D表示所述降维行为数据集中的数据到所述超球体球心的距离,||表示范数计算。Wherein, D represents the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere, and || represents the norm calculation.
本申请实施例通过将所述降维行为数据集中的数据到所述超球体球心的距离与所述超球体的半径进行比较,如果所述距离小于所述超球体半径,则认为所述数据是正常数据,并利用SQL技术汇总所述距离小于所述半径的数据,得到所述正常行为数据集。In this embodiment of the present application, the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere is compared with the radius of the hypersphere, and if the distance is smaller than the radius of the hypersphere, the data is considered to be is normal data, and the data whose distance is smaller than the radius is aggregated by using SQL technology to obtain the normal behavior data set.
进一步的,本申请实施例中,如果所述距离大于或等于所述超球体的半径,则认为所述数据为异常数据,并利用SQL技术汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。Further, in the embodiment of the present application, if the distance is greater than or equal to the radius of the hypersphere, the data is considered to be abnormal data, and SQL technology is used to summarize the data whose distance is greater than or equal to the radius to obtain: The abnormal behavior dataset.
所述数据重构模块103,用于利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集。The data reconstruction module 103 is configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set.
本申请较佳实施例中,所述利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集,包括:计算所述正常行为数据集内每个正常数据和所述异常行为数据集内每个异常数据的距离,得到距离值集;将所述距离值集中每个距离值与预设的阈值进行比较,选取不大于所述阈值的距离值集对应的正常数据及异常数据,并汇总所选取的正常数据及异常数据得到标准数据集。In a preferred embodiment of the present application, using a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set includes: calculating the normal behavior data The distance between each normal data in the set and each abnormal data in the abnormal behavior data set, to obtain a distance value set; compare each distance value in the distance value set with a preset threshold, and select not greater than the threshold The normal data and abnormal data corresponding to the distance value set are collected, and the selected normal data and abnormal data are aggregated to obtain a standard data set.
一个可选实施例中,利用下述公式计算所述正常行为数据集和所述异常行为数据集距离:In an optional embodiment, the following formula is used to calculate the distance between the normal behavior data set and the abnormal behavior data set:
Figure PCTCN2021090312-appb-000012
Figure PCTCN2021090312-appb-000012
其中,dist(x,y)表示所述正常行为数据集和所述异常行为数据集距离,xi表示所述正常行为数据集中的数据点,yi表示所述异常行为数据集中的数据点,n表示所述正常行为数据集或所述异常行为数据集的数据量。Wherein, dist(x,y) represents the distance between the normal behavior data set and the abnormal behavior data set, xi represents the data points in the normal behavior data set, yi represents the data points in the abnormal behavior data set, and n represents The data volume of the normal behavior data set or the abnormal behavior data set.
例如:预设的阈值为10,若一个正常数据A与一个异常数据B之间的距离值为5,则正常数据A和异常数据B均可划分到标准数据集中。For example, the preset threshold is 10, and if the distance between a normal data A and an abnormal data B is 5, both the normal data A and the abnormal data B can be divided into the standard data set.
所述可视化模块104,用于对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。The visualization module 104 is configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
所述可视化指的是通过一定的技术手段把不清晰无章法的数据转化为清晰直观的图表形式,方便分析查看数据,如用户浏览第三方软件的时间在所述标准数据集中都是以数字的形式存在,无法直观分析用户的浏览时间变化,若把这些数据转化为折线图,则可直观的看到用户浏览时间的变化;用户浏览第三方软件时点击软件按钮的次数也是以数字的形式在标准数据集中存在,若把这些数据转为柱形图,则可直观的看到软件中按钮的点击量,也可直观的看到哪些按钮最受用户喜欢点击,哪些按钮不受用户喜欢点击。The visualization refers to transforming the unclear and unorganized data into a clear and intuitive chart form through certain technical means, which is convenient for analysis and viewing of the data. There are forms, and it is impossible to intuitively analyze the changes of users' browsing time. If these data are converted into line graphs, you can intuitively see the changes of users' browsing time; the number of times users click software buttons when browsing third-party software is also in the form of numbers. There are standard data sets. If you convert these data into column charts, you can intuitively see the number of button clicks in the software, and you can also intuitively see which buttons are most popular among users and which buttons are not.
进一步的,本申请较佳实施例中,通过java技术调用jfreeChart图标绘制类库处理所述标准数据集,生成清晰可见的用户行为数据柱形图集。Further, in a preferred embodiment of the present application, the jfreeChart icon drawing class library is invoked through java technology to process the standard data set, and a column chart set of clearly visible user behavior data is generated.
JFreeChart是JAVA平台上的一个开放的图表绘制类库,可将数据绘制成饼图、柱状图、散点图、时序图、甘特图、折线图等多种图表,并且可以产生PNG和JPEG格式的输出,还可以与PDF和EXCEL关联。JFreeChart is an open chart drawing class library on the JAVA platform. It can draw data into pie charts, bar charts, scatter charts, time series charts, Gantt charts, line charts and other charts, and can generate PNG and JPEG formats. The output can also be associated with PDF and EXCEL.
如图5所示,是本申请实现基于第三方软件的用户数据监控分析方法的电子设备的结构示意图。As shown in FIG. 5 , it is a schematic structural diagram of an electronic device implementing a third-party software-based user data monitoring and analysis method in the present application.
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如基于第三方软件的用户数据监控分析程序12。The electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as user data monitoring and analysis based on third-party software Procedure 12.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如基于第三方软件的用户数据监控分析程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. In some embodiments, the memory 11 may be an internal storage unit of the electronic device 1 , such as a mobile hard disk of the electronic device 1 . In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can not only be used to store application software and various data installed in the electronic device 1, such as the code of the user data monitoring and analysis program 12 based on third-party software, etc., but also can be used to temporarily store the output that has been output or will be output. The data.
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个 中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行基于第三方软件的用户数据监控分析程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。In some embodiments, the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits. Central Processing Unit (CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc. The processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing the program or module (for example, executing the program) stored in the memory 11. User data monitoring and analysis programs based on third-party software, etc.), and call the data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
图5仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图5示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 5 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 5 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the drawings. components, or a combination of certain components, or a different arrangement of components.
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management The device implements functions such as charge management, discharge management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.
所述电子设备1中的所述存储器11存储的基于第三方软件的用户数据监控分析程序12是多个指令的组合,在所述处理器10中运行时,可以实现:The third-party software-based user data monitoring and analysis program 12 stored in the memory 11 of the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, can achieve:
从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中,所述计算机可读存储介质,可以是非易失性的,也可以是易失性的。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。Further, if the modules/units integrated by the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium, and the computer-readable storage medium , which can be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
进一步地,所述计算机可用存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块 链节点的使用所创建的数据等。Further, the computer usable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; using the created data, etc.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图表记视为限制所涉及的权利要求。Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any accompanying reference signs in the claims should not be construed as limiting the involved claims.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. Second-class terms are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims (22)

  1. 一种基于第三方软件的用户数据监控分析方法,其中,所述方法包括:A method for monitoring and analyzing user data based on third-party software, wherein the method comprises:
    从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
    对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
    利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
    利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
    对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  2. 如权利要求1所述的基于第三方软件的用户数据监控分析方法,其中,所述行为数据集包括用户浏览所述第三方软件的时长、用户浏览所述第三方软件相关的界面、用户浏览所述第三方软件时点击的按钮。The method for monitoring and analyzing user data based on third-party software according to claim 1, wherein the behavior data set includes the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, the user browsing the button to click when describing third-party software.
  3. 如权利要求1所述的基于第三方软件的用户数据监控分析方法,其中,所述对所述行为数据集执行降维操作,得到降维行为数据集,包括:The method for monitoring and analyzing user data based on third-party software according to claim 1, wherein, performing a dimension reduction operation on the behavior data set to obtain a dimension reduction behavior data set, comprising:
    对所述行为数据集执行编码操作,得到用户行为向量集;Perform an encoding operation on the behavior data set to obtain a user behavior vector set;
    利用预构建的权重集,计算得到所述用户行为向量集的权重行为向量集;Using the pre-built weight set, calculate the weight behavior vector set of the user behavior vector set;
    对所述权重行为向量集执行降维处理,得到所述降维行为数据集。Perform dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set.
  4. 如权利要求3所述的基于第三方软件的用户数据监控分析方法,其中,所述对所述权重行为向量集执行降维处理,得到所述降维行为数据集,包括:The method for monitoring and analyzing user data based on third-party software according to claim 3, wherein the performing dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set, comprising:
    利用如下公式对所述权重行为向量集执行降维处理:The dimensionality reduction process is performed on the weight behavior vector set using the following formula:
    Q i=(X i-X iW jW j T)(X i-X iW jW j T) T Q i =(X i -X i W j W j T )(X i -X i W j W j T ) T
    其中,Q i表示所述降维行为数据集中第i个降维行为数据,X i表示所述权重行为向量集的第i个权重行为向量,W j表示由所述权重集得到的权重矩阵中第j行向量,W j T表示W j的转置。 Wherein, Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set, X i represents the ith weight behavior vector in the weight behavior vector set, and W j represents the weight matrix obtained from the weight set. The jth row vector, W j T represents the transpose of W j .
  5. 如权利要求1所述的基于第三方软件的用户数据监控分析方法,其中,所述利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集,包括:The method for monitoring and analyzing user data based on third-party software according to claim 1, wherein the dimensionality reduction behavior data set is detected by using a pre-built data abnormality detection model to obtain a normal behavior data set and an abnormal behavior data set ,include:
    根据所述降维行为数据集构造超球体,并计算所述超球体的半径;Construct a hypersphere according to the dimensionality reduction behavior data set, and calculate the radius of the hypersphere;
    计算所述降维行为数据集中的数据到所述超球体球心的距离;Calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere;
    汇总所述距离小于所述半径的数据,得到所述正常行为数据集;Summarize the data whose distance is less than the radius to obtain the normal behavior data set;
    汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。Summarize the data whose distance is greater than or equal to the radius to obtain the abnormal behavior data set.
  6. 如权利要求5所述的基于第三方软件的用户数据监控分析方法,其中,所述计算所述超球体的半径,包括:The method for monitoring and analyzing user data based on third-party software according to claim 5, wherein the calculating the radius of the hypersphere comprises:
    利用下述公式计算所述超球体的半径:The radius of the hypersphere is calculated using the following formula:
    Figure PCTCN2021090312-appb-100001
    Figure PCTCN2021090312-appb-100001
    其中,R表示所述超球体的半径,α i表示所述超球体的第一拉格朗日乘子,α j表示所述超球体的第二拉格朗日乘子,Q i,Q j表示所述降维行为数据集中任意两个降维行为数据,K()表示高斯核函数。 Wherein, R represents the radius of the hypersphere, α i represents the first Lagrangian multiplier of the hypersphere, α j represents the second Lagrangian multiplier of the hypersphere, Q i , Q j represents any two dimensionality reduction behavior data in the dimensionality reduction behavior data set, and K() represents a Gaussian kernel function.
  7. 如权利要求1至6中任意一项所述的基于第三方软件的用户数据监控分析方法,其中,所述利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执 行数据重构,得到标准数据集,包括:The method for monitoring and analyzing user data based on third-party software according to any one of claims 1 to 6, wherein the use of a preset collaborative filtering algorithm is based on the normal behavior data set and the abnormal behavior data set Perform data reconstruction to get a standard dataset, including:
    计算所述正常行为数据集内每个正常数据和所述异常行为数据集内每个异常数据的距离,得到距离值集;Calculate the distance between each normal data in the normal behavior data set and each abnormal data in the abnormal behavior data set to obtain a distance value set;
    将所述距离值集中每个距离值与预设的阈值进行比较,选取不大于所述阈值的距离值集对应的正常数据及异常数据,并汇总所选取的正常数据及异常数据得到标准数据集。Compare each distance value in the distance value set with a preset threshold value, select normal data and abnormal data corresponding to the distance value set not greater than the threshold value, and summarize the selected normal data and abnormal data to obtain a standard data set .
  8. 一种基于第三方软件的用户数据监控分析装置,其中,所述装置包括:A device for monitoring and analyzing user data based on third-party software, wherein the device includes:
    行为数据获取模块,用于从第三方软件中收集目标用户的行为数据集;A behavioral data acquisition module, used to collect behavioral data sets of target users from third-party software;
    数据检测模块,用于对所述行为数据集执行降维操作,得到降维行为数据集,利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;A data detection module is used to perform a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, and use a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and abnormal behavior data set;
    数据重构模块,用于利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;a data reconstruction module, configured to use a preset collaborative filtering algorithm to perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
    可视化模块,用于对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。A visualization module, configured to perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  9. 一种电子设备,其中,所述电子设备包括:An electronic device, wherein the electronic device comprises:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下步骤:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the steps of:
    从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
    对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
    利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
    利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
    对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  10. 如权利要求9所述的电子设备,其中,所述行为数据集包括用户浏览所述第三方软件的时长、用户浏览所述第三方软件相关的界面、用户浏览所述第三方软件时点击的按钮。The electronic device according to claim 9, wherein the behavior data set includes the duration of the user browsing the third-party software, the interface related to the user browsing the third-party software, and the buttons clicked by the user when browsing the third-party software .
  11. 如权利要求9所述的电子设备,其中,所述对所述行为数据集执行降维操作,得到降维行为数据集,包括:The electronic device according to claim 9, wherein, performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, comprising:
    对所述行为数据集执行编码操作,得到用户行为向量集;Perform an encoding operation on the behavior data set to obtain a user behavior vector set;
    利用预构建的权重集,计算得到所述用户行为向量集的权重行为向量集;Using the pre-built weight set, calculate the weight behavior vector set of the user behavior vector set;
    对所述权重行为向量集执行降维处理,得到所述降维行为数据集。Perform dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set.
  12. 如权利要求11所述的电子设备,其中,所述对所述权重行为向量集执行降维处理,得到所述降维行为数据集,包括:The electronic device according to claim 11, wherein the performing a dimensionality reduction process on the weight behavior vector set to obtain the dimensionality reduction behavior data set, comprising:
    利用如下公式对所述权重行为向量集执行降维处理:The dimensionality reduction process is performed on the weight behavior vector set using the following formula:
    Q i=(X i-X iW jW j T)(X i-X iW jW j T) T Q i =(X i -X i W j W j T )(X i -X i W j W j T ) T
    其中,Q i表示所述降维行为数据集中第i个降维行为数据,X i表示所述权重行为向量集的第i个权重行为向量,W j表示由所述权重集得到的权重矩阵中第j行向量,W j T表示W j的转置。 Wherein, Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set, X i represents the ith weight behavior vector in the weight behavior vector set, and W j represents the weight matrix obtained from the weight set. The jth row vector, W j T represents the transpose of W j .
  13. 如权利要求9所述的电子设备,其中,所述利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集,包括:The electronic device according to claim 9, wherein, using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set, comprising:
    根据所述降维行为数据集构造超球体,并计算所述超球体的半径;Construct a hypersphere according to the dimensionality reduction behavior data set, and calculate the radius of the hypersphere;
    计算所述降维行为数据集中的数据到所述超球体球心的距离;Calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere;
    汇总所述距离小于所述半径的数据,得到所述正常行为数据集;Summarize the data whose distance is less than the radius to obtain the normal behavior data set;
    汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。Summarize the data whose distance is greater than or equal to the radius to obtain the abnormal behavior data set.
  14. 如权利要求13所述的电子设备,其中,所述计算所述超球体的半径,包括:The electronic device of claim 13, wherein the calculating the radius of the hypersphere comprises:
    利用下述公式计算所述超球体的半径:The radius of the hypersphere is calculated using the following formula:
    Figure PCTCN2021090312-appb-100002
    Figure PCTCN2021090312-appb-100002
    其中,R表示所述超球体的半径,α i表示所述超球体的第一拉格朗日乘子,α j表示所述超球体的第二拉格朗日乘子,Q i,Q j表示所述降维行为数据集中任意两个降维行为数据,K()表示高斯核函数。 Wherein, R represents the radius of the hypersphere, α i represents the first Lagrangian multiplier of the hypersphere, α j represents the second Lagrangian multiplier of the hypersphere, Q i , Q j represents any two dimensionality reduction behavior data in the dimensionality reduction behavior data set, and K() represents a Gaussian kernel function.
  15. 如权利要求9至14中任意一项所述的电子设备,其中,所述利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集,包括:The electronic device according to any one of claims 9 to 14, wherein, by using a preset collaborative filtering algorithm, data reconstruction is performed according to the normal behavior data set and the abnormal behavior data set to obtain standard data set, including:
    计算所述正常行为数据集内每个正常数据和所述异常行为数据集内每个异常数据的距离,得到距离值集;Calculate the distance between each normal data in the normal behavior data set and each abnormal data in the abnormal behavior data set to obtain a distance value set;
    将所述距离值集中每个距离值与预设的阈值进行比较,选取不大于所述阈值的距离值集对应的正常数据及异常数据,并汇总所选取的正常数据及异常数据得到标准数据集。Compare each distance value in the distance value set with a preset threshold value, select normal data and abnormal data corresponding to the distance value set not greater than the threshold value, and summarize the selected normal data and abnormal data to obtain a standard data set .
  16. 一种计算机可读存储介质,包括存储数据区和存储程序区,存储数据区存储创建的数据,存储程序区存储有计算机程序;其中,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium, comprising a storage data area and a storage program area, the storage data area stores data created, and the storage program area stores a computer program; wherein, the computer program is executed by a processor The following steps are implemented:
    从第三方软件中收集目标用户的行为数据集;Collect behavioral data sets of target users from third-party software;
    对所述行为数据集执行降维操作,得到降维行为数据集;performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set;
    利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集;Using a pre-built data anomaly detection model to detect the dimensionality reduction behavior data set to obtain a normal behavior data set and an abnormal behavior data set;
    利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集;Using a preset collaborative filtering algorithm, perform data reconstruction according to the normal behavior data set and the abnormal behavior data set to obtain a standard data set;
    对所述标准数据集执行可视化处理,得到可视化图表集,并将所述可视化图表集传送至预设终端。Perform visualization processing on the standard data set to obtain a visual chart set, and transmit the visual chart set to a preset terminal.
  17. 如权利要求16所述的计算机可读存储介质,其中,所述行为数据集包括用户浏览所述第三方软件的时长、用户浏览所述第三方软件相关的界面、用户浏览所述第三方软件时点击的按钮。The computer-readable storage medium of claim 16 , wherein the behavior data set includes the duration of the user browsing the third-party software, the time the user browses the third-party software related interface, the time when the user browses the third-party software button to click.
  18. 如权利要求16所述的计算机可读存储介质,其中,所述对所述行为数据集执行降维操作,得到降维行为数据集,包括:The computer-readable storage medium of claim 16, wherein the performing a dimensionality reduction operation on the behavior data set to obtain a dimensionality reduction behavior data set, comprising:
    对所述行为数据集执行编码操作,得到用户行为向量集;Perform an encoding operation on the behavior data set to obtain a user behavior vector set;
    利用预构建的权重集,计算得到所述用户行为向量集的权重行为向量集;Using the pre-built weight set, calculate the weight behavior vector set of the user behavior vector set;
    对所述权重行为向量集执行降维处理,得到所述降维行为数据集。Perform dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set.
  19. 如权利要求18所述的计算机可读存储介质,其中,所述对所述权重行为向量集执行降维处理,得到所述降维行为数据集,包括:The computer-readable storage medium according to claim 18, wherein the performing dimensionality reduction processing on the weight behavior vector set to obtain the dimensionality reduction behavior data set comprises:
    利用如下公式对所述权重行为向量集执行降维处理:The dimensionality reduction process is performed on the weight behavior vector set using the following formula:
    Q i=(X i-X iW jW j T)(X i-X iW jW j T) T Q i =(X i -X i W j W j T )(X i -X i W j W j T ) T
    其中,Q i表示所述降维行为数据集中第i个降维行为数据,X i表示所述权重行为向量集的第i个权重行为向量,W j表示由所述权重集得到的权重矩阵中第j行向量,W j T表示W j的转置。 Wherein, Q i represents the ith dimension reduction behavior data in the dimensionality reduction behavior data set, X i represents the ith weight behavior vector in the weight behavior vector set, and W j represents the weight matrix obtained from the weight set. The jth row vector, W j T represents the transpose of W j .
  20. 如权利要求16所述的计算机可读存储介质,其中,所述利用预先构建的数据异常检测模型,检测所述降维行为数据集,得到正常行为数据集和异常行为数据集,包括:The computer-readable storage medium according to claim 16 , wherein, detecting the dimensionality reduction behavior data set by using a pre-built data abnormality detection model to obtain a normal behavior data set and an abnormal behavior data set, comprising:
    根据所述降维行为数据集构造超球体,并计算所述超球体的半径;Construct a hypersphere according to the dimensionality reduction behavior data set, and calculate the radius of the hypersphere;
    计算所述降维行为数据集中的数据到所述超球体球心的距离;Calculate the distance from the data in the dimensionality reduction behavior data set to the center of the hypersphere;
    汇总所述距离小于所述半径的数据,得到所述正常行为数据集;Summarize the data whose distance is less than the radius to obtain the normal behavior data set;
    汇总所述距离大于或等于所述半径的数据,得到所述异常行为数据集。Summarize the data whose distance is greater than or equal to the radius to obtain the abnormal behavior data set.
  21. 如权利要求20所述的计算机可读存储介质,其中,所述计算所述超球体的半径,包括:The computer-readable storage medium of claim 20, wherein the calculating the radius of the hypersphere comprises:
    利用下述公式计算所述超球体的半径:The radius of the hypersphere is calculated using the following formula:
    Figure PCTCN2021090312-appb-100003
    Figure PCTCN2021090312-appb-100003
    其中,R表示所述超球体的半径,α i表示所述超球体的第一拉格朗日乘子,α j表示所述超球体的第二拉格朗日乘子,Q i,Q j表示所述降维行为数据集中任意两个降维行为数据,K()表示高斯核函数。 Wherein, R represents the radius of the hypersphere, α i represents the first Lagrangian multiplier of the hypersphere, α j represents the second Lagrangian multiplier of the hypersphere, Q i , Q j represents any two dimensionality reduction behavior data in the dimensionality reduction behavior data set, and K() represents a Gaussian kernel function.
  22. 如权利要求16至21中任意一项所述的计算机可读存储介质,其中,所述利用预设的协同过滤算法,根据所述正常行为数据集和所述异常行为数据集执行数据重构,得到标准数据集,包括:The computer-readable storage medium according to any one of claims 16 to 21, wherein the data reconstruction is performed according to the normal behavior data set and the abnormal behavior data set by using a preset collaborative filtering algorithm, Get a standard dataset, including:
    计算所述正常行为数据集内每个正常数据和所述异常行为数据集内每个异常数据的距离,得到距离值集;Calculate the distance between each normal data in the normal behavior data set and each abnormal data in the abnormal behavior data set to obtain a distance value set;
    将所述距离值集中每个距离值与预设的阈值进行比较,选取不大于所述阈值的距离值集对应的正常数据及异常数据,并汇总所选取的正常数据及异常数据得到标准数据集。Compare each distance value in the distance value set with a preset threshold value, select normal data and abnormal data corresponding to the distance value set not greater than the threshold value, and summarize the selected normal data and abnormal data to obtain a standard data set .
PCT/CN2021/090312 2020-11-02 2021-04-27 User data monitoring and analysis method, apparatus, device, and medium WO2022088632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011204209.1A CN112306835B (en) 2020-11-02 2020-11-02 User data monitoring and analyzing method, device, equipment and medium
CN202011204209.1 2020-11-02

Publications (1)

Publication Number Publication Date
WO2022088632A1 true WO2022088632A1 (en) 2022-05-05

Family

ID=74333679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090312 WO2022088632A1 (en) 2020-11-02 2021-04-27 User data monitoring and analysis method, apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN112306835B (en)
WO (1) WO2022088632A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471215A (en) * 2022-10-31 2022-12-13 江西省煤田地质局普查综合大队 Business process processing method and device
CN116540790A (en) * 2023-07-05 2023-08-04 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306835B (en) * 2020-11-02 2024-05-28 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium
CN113806738A (en) * 2021-09-01 2021-12-17 浪潮卓数大数据产业发展有限公司 Block chain-based user behavior tracking method and system
CN118195641A (en) * 2024-05-17 2024-06-14 智联信通科技股份有限公司 Block chain-based intelligent manufacturing field production whole-flow traceability method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224872A (en) * 2015-09-30 2016-01-06 河南科技大学 A kind of user's anomaly detection method based on neural network clustering
CN105843936A (en) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 Service data report form method and system
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
US10147049B2 (en) * 2015-08-31 2018-12-04 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN111027594A (en) * 2019-11-18 2020-04-17 西北工业大学 Step-by-step anomaly detection method based on dictionary representation
CN112306835A (en) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108287782A (en) * 2017-06-05 2018-07-17 中兴通讯股份有限公司 A kind of multidimensional data method for detecting abnormality and device
CN107426177A (en) * 2017-06-13 2017-12-01 努比亚技术有限公司 A kind of user behavior clustering method and terminal, computer-readable recording medium
US10701094B2 (en) * 2017-06-22 2020-06-30 Oracle International Corporation Techniques for monitoring privileged users and detecting anomalous activities in a computing environment
CN109783481A (en) * 2018-12-19 2019-05-21 新华三大数据技术有限公司 Data processing method and device
CN109710663B (en) * 2018-12-29 2020-12-04 北京神舟航天软件技术有限公司 Data statistical chart generation method
CN110413681A (en) * 2019-08-01 2019-11-05 上海胜泰信息技术有限公司 A Web end group is in the visualized data processing method of big data technology
CN111369339A (en) * 2020-03-02 2020-07-03 深圳索信达数据技术有限公司 Over-sampling improved svdd-based bank client transaction behavior abnormity identification method
CN111507470A (en) * 2020-03-02 2020-08-07 上海金仕达软件科技有限公司 Abnormal account identification method and device
CN111813618A (en) * 2020-05-28 2020-10-23 平安科技(深圳)有限公司 Data anomaly detection method, device, equipment and storage medium
CN111488363B (en) * 2020-06-28 2020-10-02 平安国际智慧城市科技股份有限公司 Data processing method, device, electronic equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10147049B2 (en) * 2015-08-31 2018-12-04 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN105224872A (en) * 2015-09-30 2016-01-06 河南科技大学 A kind of user's anomaly detection method based on neural network clustering
CN105843936A (en) * 2016-03-31 2016-08-10 乐视控股(北京)有限公司 Service data report form method and system
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
CN111027594A (en) * 2019-11-18 2020-04-17 西北工业大学 Step-by-step anomaly detection method based on dictionary representation
CN112306835A (en) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471215A (en) * 2022-10-31 2022-12-13 江西省煤田地质局普查综合大队 Business process processing method and device
CN116540790A (en) * 2023-07-05 2023-08-04 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium
CN116540790B (en) * 2023-07-05 2023-09-08 深圳市保凌影像科技有限公司 Tripod head stability control method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112306835B (en) 2024-05-28
CN112306835A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
WO2022088632A1 (en) User data monitoring and analysis method, apparatus, device, and medium
US20210374610A1 (en) Efficient duplicate detection for machine learning data sets
US10216558B1 (en) Predicting drive failures
WO2021189904A1 (en) Data anomaly detection method and apparatus, and electronic device and storage medium
US20150379429A1 (en) Interactive interfaces for machine learning model evaluations
WO2022222943A1 (en) Department recommendation method and apparatus, electronic device and storage medium
WO2021189827A1 (en) Method and apparatus for recognizing blurred image, and device and computer-readable storage medium
CN104112084B (en) Execution-based license discovery and optimization method and device
WO2021238563A1 (en) Enterprise operation data analysis method and apparatus based on configuration algorithm, and electronic device and medium
CN111523677A (en) Method and device for explaining prediction result of machine learning model
CN111340240A (en) Method and device for realizing automatic machine learning
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN112507230B (en) Webpage recommendation method and device based on browser, electronic equipment and storage medium
Chen et al. Silhouette: Efficient cloud configuration exploration for large-scale analytics
WO2022227192A1 (en) Image classification method and apparatus, and electronic device and medium
CN111046085A (en) Data source tracing processing method and device, medium and equipment
CN113780675B (en) Consumption prediction method and device, storage medium and electronic equipment
US11790087B2 (en) Method and apparatus to identify hardware performance counter events for detecting and classifying malware or workload using artificial intelligence
Shen et al. Cost-sensitive tensor-based dual-stage attention lstm with feature selection for data center server power forecasting
WO2022141838A1 (en) Model confidence analysis method and apparatus, electronic device and computer storage medium
WO2022227191A1 (en) Inactive living body detection method and apparatus, electronic device, and storage medium
US20230033753A1 (en) Automatic improvement of software applications
CN111652741B (en) User preference analysis method, device and readable storage medium
CN114385155A (en) vue project visualization tool generation method, device, equipment and storage medium
US11244007B2 (en) Automatic adaption of a search configuration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884375

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884375

Country of ref document: EP

Kind code of ref document: A1