CN111258874B - User operation track analysis method and device based on web data - Google Patents

User operation track analysis method and device based on web data Download PDF

Info

Publication number
CN111258874B
CN111258874B CN201811453609.9A CN201811453609A CN111258874B CN 111258874 B CN111258874 B CN 111258874B CN 201811453609 A CN201811453609 A CN 201811453609A CN 111258874 B CN111258874 B CN 111258874B
Authority
CN
China
Prior art keywords
trajectory
user operation
trajectories
default
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811453609.9A
Other languages
Chinese (zh)
Other versions
CN111258874A (en
Inventor
乔柏林
叶晓龙
任赣
竺士杰
蒋通通
胡林熙
邱佳
孟震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201811453609.9A priority Critical patent/CN111258874B/en
Publication of CN111258874A publication Critical patent/CN111258874A/en
Application granted granted Critical
Publication of CN111258874B publication Critical patent/CN111258874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/362Debugging of software
    • G06F11/366Debugging of software using diagnostics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • G06F11/3672Test management
    • G06F11/3692Test management for test results analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供一种基于web数据的用户操作轨迹分析方法及装置。所述方法包括实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹,本发明实施例通过对将采集到的用户操作轨迹与对应业务类型的默认轨迹进行比较,若不同,则判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。

Embodiments of the present invention provide a method and device for analyzing user operation traces based on web data. The method includes acquiring user operation trajectories in real time, the user operation trajectories at least including business types; comparing the user operation trajectories with all default trajectories with the same business type according to a trajectory model obtained in advance through a clustering algorithm; Wherein, the trajectory model includes at least one default trajectory corresponding to each business type; if the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory, and the embodiment of the present invention collects The obtained user operation trajectory is compared with the default trajectory of the corresponding service type, and if they are different, it is determined that the user operation trajectory is an abnormal trajectory, so that the user operation trajectory can be analyzed more simply and efficiently.

Description

一种基于web数据的用户操作轨迹分析方法及装置A user operation trajectory analysis method and device based on web data

技术领域technical field

本发明实施例涉及计算机技术领域,尤其涉及一种基于web数据的用户操作轨迹分析方法及装置。Embodiments of the present invention relate to the field of computer technology, and in particular to a method and device for analyzing user operation traces based on web data.

背景技术Background technique

云计算和容器云的普及,使得大量IT应用系统逐步被部署在虚拟化、容器化环境中。而随着各类业务场景的不断丰富和业务量的井喷式增长,给系统及应用的易维护性上带来巨大的挑战。尤其是在电信行业,运营商本身就构建了非常多的应用系统为广大消费者提供各种特色服务,而有些系统功能更涉及到多个业务系统的子功能,需要多系统协同才能正常工作。架构的演变更加剧此类业务系统的复杂性,对用户操作行为分析提出了更高的要求。With the popularity of cloud computing and container cloud, a large number of IT application systems are gradually deployed in virtualized and containerized environments. With the continuous enrichment of various business scenarios and the blowout growth of business volume, it brings huge challenges to the maintainability of the system and applications. Especially in the telecommunications industry, operators themselves have built a lot of application systems to provide consumers with various special services, and some system functions involve sub-functions of multiple business systems, requiring multi-system collaboration to work properly. The evolution of the architecture has intensified the complexity of such business systems, and put forward higher requirements for the analysis of user operation behavior.

针对上述问题,现有技术主要采用的方案有:方案一:基于人工梳理的操作轨迹整理:传统的维护人员想了解业务办理全流程的操作轨迹,需要项目阶段就要对业务操作流程进行整理,通过交维的手册的形式传递给后续维护人员,后续如果有业务变更和新增,需要依靠开发人员和维护人员的自觉行为予以更新。该方法适合小型等变化率不大的应用系统,见效相对稳定。方案二:基于代码预埋操作轨迹输出:代码预埋操作轨迹输出主要是在代码开发阶段,将操作轨迹所需要信息输出提前在代码中实现。待投入生产环境后,用户的每一步操作都会输出至轨迹分析中心,分析中心通过ip地址,用户标识,序列号,时间等维度将每笔业务的操作路径整理出来。后续新业务代码开发时,按照预订的开发规范进行编码即可保证后续的业务也能纳入操作轨迹中心。方案三:基于探针的操作轨迹获取:该方案是通过在中间件中引入探针包的方式,对部署的中间件获取方法级的调用记录。通过自动埋码和采集数据的软件开发工具包(Software Development Kit,SDK)来自动完成操作轨迹代码的注入工作。通过这种方式可以做到开发人员只需要修改少量代码甚至一行代码都无需修改。后续对调用方法及业务操作之间的关系映射,完成用户的操作轨迹分析。In view of the above problems, the main solutions adopted in the existing technology are: Solution 1: Operation trajectory sorting based on manual combing: traditional maintenance personnel want to understand the operation trajectory of the whole process of business handling, and need to sort out the business operation process at the project stage. It is passed on to the follow-up maintenance personnel in the form of a maintenance manual. If there are subsequent business changes or new additions, it needs to be updated by the conscious behavior of the developers and maintenance personnel. This method is suitable for small application systems with little change rate, and the effect is relatively stable. Solution 2: Output of operation trajectory based on code embedding: The output of code embedding operation trajectory is mainly in the code development stage, and the information output required by the operation trajectory is realized in the code in advance. After being put into the production environment, each step of the user's operation will be output to the trajectory analysis center, and the analysis center will sort out the operation path of each business through dimensions such as IP address, user ID, serial number, and time. When developing new business codes in the future, coding according to the predetermined development specifications can ensure that subsequent businesses can also be included in the operation track center. Solution 3: Acquisition of operation traces based on probes: This solution is to obtain method-level call records for deployed middleware by introducing probe packages into middleware. The injection of the operation trajectory code is automatically completed through the software development kit (Software Development Kit, SDK) that automatically embeds codes and collects data. In this way, developers only need to modify a small amount of code or even a single line of code. Follow-up map the relationship between the calling method and the business operation, and complete the analysis of the user's operation track.

但是现有技术均存在着严重的不足:随着目前各类系统集群规模的不断扩大,单纯的人工梳理已经成为一项艰巨的任务,更不用说由于敏捷开发的落地导致应用的代码变动与日俱增,从而带来的操作类型和步骤的暴增。急速增长的业务操作知识无法快速准确地得到梳理,而且现有的知识手册也越来越不准确。准确来说,该方式不适合中大型。方案二需要开发人员对整体项目预先就设计好操作轨迹输出方案,但是现有的生产系统往往有多个项目联合开发,引入不同的厂家,采用不同的技术框架,同系统中可能存在老中青三代系统,等等现实问题造成无法通过一次性改造甚至存在部分系统根本无法完成预埋改造,如果仅有部分系统输出数据效果并不明显。因此,该方式存在实际上的推广缺陷。方案三采用了基于中间件探针的操作轨迹采集方案,虽然对代码基本无需改动,但是现有技术在系统稳定性,系统快速部署,数据采集延展性离实际需要尚有一定的距离,暂时无法满足生产实际需要。综上,现有技术过于复杂、在数据分析能力上效率低下。However, there are serious deficiencies in existing technologies: as the scale of various system clusters continues to expand, simple manual sorting has become a difficult task, not to mention that due to the implementation of agile development, the application code changes are increasing day by day. The resulting surge in operation types and steps. The rapidly growing knowledge of business operations cannot be sorted out quickly and accurately, and the existing knowledge manuals are increasingly inaccurate. To be precise, this method is not suitable for medium and large scale. Solution 2 requires developers to design the operation trajectory output plan for the overall project in advance, but the existing production system often has multiple projects jointly developed, introducing different manufacturers and adopting different technical frameworks, and there may be old, middle-aged and young people in the same system. Three generations of systems, and other practical problems make it impossible to undergo a one-time transformation, and even some systems cannot complete the pre-embedded transformation at all. If only some systems output data, the effect is not obvious. Therefore, there are actual promotion defects in this way. Solution 3 adopts the middleware probe-based operation trajectory acquisition solution. Although the code basically does not need to be changed, the existing technology still has a certain distance from the actual needs in terms of system stability, system rapid deployment, and data acquisition scalability. Meet the actual needs of production. In summary, the existing technology is too complex and inefficient in terms of data analysis capabilities.

发明内容Contents of the invention

本发明实施例提供一种基于web数据的用户操作轨迹分析方法及装置,用以解决现有技术过于复杂、在数据分析能力上效率低下。Embodiments of the present invention provide a method and device for analyzing user operation traces based on web data, which are used to solve the problem that the prior art is too complex and inefficient in terms of data analysis capabilities.

第一方面,本发明实施例提供了一种基于web数据的用户操作轨迹分析方法,包括:In the first aspect, the embodiment of the present invention provides a user operation track analysis method based on web data, including:

实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;Obtaining user operation traces in real time, where the user operation traces at least include business types;

根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;According to the trajectory model obtained in advance through the clustering algorithm, the user operation trajectory is compared with all default trajectories with the same business type; wherein the trajectory model includes at least one default trajectory corresponding to each business type;

若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。If the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory.

第二方面,本发明实施例提供了一种用于基于web数据的用户操作轨迹分析装置,其特征在于,包括:In the second aspect, an embodiment of the present invention provides a device for analyzing user operation traces based on web data, which is characterized in that it includes:

流量采集单元,用于实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;A traffic collection unit, configured to acquire user operation traces in real time, where the user operation traces at least include service types;

轨迹分析单元,用于根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;A trajectory analysis unit, configured to compare the user operation trajectory with all default trajectories of the same business type according to a trajectory model obtained in advance through a clustering algorithm; wherein the trajectory model includes at least one default track;

交叉识别单元,用于若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。A cross identification unit, configured to mark the user operation track as an abnormal track if the user operation track is different from a default track.

第三方面,本发明实施例还提供了一种电子设备,包括:In a third aspect, an embodiment of the present invention also provides an electronic device, including:

处理器、存储器、通信接口和通信总线;其中,processor, memory, communication interface, and communication bus; wherein,

所述处理器、存储器、通信接口通过所述通信总线完成相互间的通信;The processor, the memory, and the communication interface complete mutual communication through the communication bus;

所述通信接口用于该电子设备的通信设备之间的信息传输;The communication interface is used for information transmission between communication devices of the electronic device;

所述存储器存储有可被所述处理器执行的计算机程序指令,所述处理器调用所述程序指令能够执行如下方法:The memory stores computer program instructions executable by the processor, and the processor calls the program instructions to perform the following methods:

实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;Obtaining user operation traces in real time, where the user operation traces at least include business types;

根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;According to the trajectory model obtained in advance through the clustering algorithm, the user operation trajectory is compared with all default trajectories with the same business type; wherein the trajectory model includes at least one default trajectory corresponding to each business type;

若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。If the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory.

第四方面,本发明实施例还提供了一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如下方法:In a fourth aspect, the embodiment of the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the following method is implemented:

实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;Obtaining user operation traces in real time, where the user operation traces at least include business types;

根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;According to the trajectory model obtained in advance through the clustering algorithm, the user operation trajectory is compared with all default trajectories with the same business type; wherein the trajectory model includes at least one default trajectory corresponding to each business type;

若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。If the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory.

本发明实施例提供的基于web数据的用户操作轨迹分析方法及装置,通过对将采集到的用户操作轨迹与对应业务类型的默认轨迹进行比较,若不同,则判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。The web data-based user operation trajectory analysis method and device provided by the embodiments of the present invention compare the collected user operation trajectory with the default trajectory of the corresponding business type, and if they are different, determine that the user operation trajectory is an abnormal trajectory , so that the user's operation trajectory can be accurately analyzed more simply and efficiently.

附图说明Description of drawings

图1为本发明实施例的基于web数据的用户操作轨迹分析方法流程图;Fig. 1 is the flow chart of the method for analyzing user's operation trajectory based on web data according to an embodiment of the present invention;

图2为本发明实施例的另一基于web数据的用户操作轨迹分析方法流程图;2 is a flowchart of another web data-based user operation trajectory analysis method according to an embodiment of the present invention;

图3为本发明实施例的又一基于web数据的用户操作轨迹分析方法流程图;3 is a flow chart of another web data-based user operation trajectory analysis method according to an embodiment of the present invention;

图4为本发明实施例的用于基于web数据的用户操作轨迹分析装置结构示意图;4 is a schematic structural diagram of a device for analyzing user operation traces based on web data according to an embodiment of the present invention;

图5为本发明实施例的另一用于基于web数据的用户操作轨迹分析装置结构示意图;5 is a schematic structural diagram of another device for analyzing user operation traces based on web data according to an embodiment of the present invention;

图6为本发明实施例的又一用于基于web数据的用户操作轨迹分析装置结构示意图;6 is a schematic structural diagram of another device for analyzing user operation traces based on web data according to an embodiment of the present invention;

图7示例了一种电子设备的实体结构示意图。Fig. 7 illustrates a schematic diagram of the physical structure of an electronic device.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

图1为本发明实施例的基于web数据的用户操作轨迹分析方法流程图,如图1所示,所述方法包括:Fig. 1 is a flowchart of a method for analyzing user operation tracks based on web data according to an embodiment of the present invention. As shown in Fig. 1 , the method includes:

步骤S01、实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型。Step S01. Acquire user operation traces in real time, where the user operation traces at least include service types.

通过采集网络中Web数据,可以得到用户在办理业务时的用户操作轨迹,所述用户操作轨迹包括了用户操作的多维数据,具体至少包括业务类型、工号、地址、时间和时长等。By collecting Web data in the network, the user operation trajectory of the user when handling business can be obtained. The user operation trajectory includes multi-dimensional data of user operations, specifically at least including business type, job number, address, time and duration, etc.

在具体的采集过程中传统的网络镜像使用交换机镜像端口或者分光器进行采集,对于目前敏捷发布,部署云化,容器化存在部署灵活不足。本发明实施例采用融合采集技术,既支持传统物理交换机,又能支持虚机和容器的灵活部署,能完整支持当前的技术架构。In the specific collection process, traditional network mirroring uses switch mirroring ports or optical splitters for collection. For the current agile release, cloud deployment, and containerization, deployment flexibility is insufficient. The embodiment of the present invention adopts the integrated collection technology, which not only supports traditional physical switches, but also supports flexible deployment of virtual machines and containers, and can completely support the current technical architecture.

1)对于部署在物理机上的系统,从现有的流量交换机上面引入流量镜像,并由部署在物理采集机上的采集程序对流量数据进行采集,并输出至流量汇聚机上供后续分析。1) For a system deployed on a physical machine, import traffic mirroring from the existing traffic switch, and the collection program deployed on the physical collection machine collects the traffic data, and outputs it to the traffic aggregation machine for subsequent analysis.

2)对于部署在WMWare等虚拟机上面的系统,通过虚拟交换机(Virtual Switch,VSwitch)的镜像方案,在虚机集群内部署虚拟机流量采集程序,能自动对VMWare网络封包进行解包,将解包后的流量数据输出至流量汇聚机上供后续分析。2) For systems deployed on virtual machines such as WMWare, through the virtual switch (Virtual Switch, VSwitch) mirroring scheme, the virtual machine traffic collection program is deployed in the virtual machine cluster, which can automatically unpack the VMWare network packets and unpack the unpacked The packaged traffic data is output to the traffic aggregator for subsequent analysis.

3)对于采用容器技术部署的系统,由于容器的动态部署的特性,本身流量无法确定,采用对网络负载层的流量进行采集,并输出至流量汇聚机供分析。3) For systems deployed using container technology, due to the dynamic deployment characteristics of containers, the traffic itself cannot be determined. The traffic at the network load layer is collected and output to the traffic aggregation machine for analysis.

对于采集过来的web数据,通过预设的业务信息自动处理功能,从web数据中获取工号,IP,业务类型,地址,时间及时长,返回结果,并可以定制化数据转换,处理完的数据按照指定格式输出为用户操作轨迹。For the collected web data, through the preset business information automatic processing function, obtain the job number, IP, business type, address, time and length from the web data, and return the result, and can customize the data conversion and processed data Output as user operation trace according to the specified format.

步骤S02、根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹。Step S02, according to the trajectory model obtained in advance through the clustering algorithm, compare the user operation trajectory with all default trajectories with the same business type; wherein, the trajectory model includes at least one default trajectories corresponding to each business type track.

预先获取与每种业务类型对应的默认轨迹,相当于是用户办理该业务时典型的用户操作轨迹。所述默认轨迹的获取方法,可以是根据每种业务类型的分类和在整个业务中心的设置,直接得的至少一条预设的默认轨迹,也可以对实际的历史数据通过预设的算法,例如聚类算法,相当于,通过用户实际使用过程中的用户操作轨迹进行统计来获取轨迹模型,所述轨迹模型中包括有与每种业务类型对应的至少一条默认轨迹,例如进行变更套餐业务的默认轨迹为操作a1、a2、a3、a4、a5;进行订购流量的默认轨迹为操作b1、b2、b3、b4。Obtaining in advance the default trajectory corresponding to each business type is equivalent to a typical user operation trajectory when the user handles the business. The acquisition method of the default trajectory may be to directly obtain at least one preset default trajectory according to the classification of each business type and the setting in the entire business center, or to use a preset algorithm for the actual historical data, for example The clustering algorithm is equivalent to obtaining a trajectory model through statistics of user operation trajectories in the actual use process of the user. The trajectory model includes at least one default trajectory corresponding to each business type, such as changing the default package service. The trajectories are operations a1, a2, a3, a4, a5; the default trajectories for ordering traffic are operations b1, b2, b3, b4.

步骤S03、若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。Step S03, if the user operation trajectory is different from the default trajectory, mark the user operation trajectory as an abnormal trajectory.

经过比对后,若所述用户操作轨迹与默认轨迹相同,则不进行任何后续操作。而若不同,则将所述用户操作轨迹标记为异常轨迹。可以进行告警或仅作记录。例如,所述用户在进行变更套餐业务时的用户操作轨迹为操作a1、a2、a3、a6、a7、a5,可见与默认轨迹操作a1、a2、a3、a4、a5不同,此时就可以将该用户操作轨迹作为异常轨迹进行记录。After comparison, if the user operation trajectory is the same as the default trajectory, no subsequent operation is performed. If different, the user operation track is marked as an abnormal track. Can be alerted or just logged. For example, the user operation trajectory of the user when changing the package service is operation a1, a2, a3, a6, a7, a5, which can be seen to be different from the default trajectory operation a1, a2, a3, a4, a5. The user operation track is recorded as an abnormal track.

本发明实施例通过对将采集到的用户操作轨迹与对应业务类型的默认轨迹进行比较,若不同,则判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。In the embodiment of the present invention, by comparing the collected user operation trajectory with the default trajectory of the corresponding business type, if they are different, it is determined that the user operation trajectory is an abnormal trajectory, so that the user operation trajectory can be accurately analyzed more simply and efficiently .

图2为本发明实施例的另一基于web数据的用户操作轨迹分析方法流程图,如图2所示,所述方法还包括:FIG. 2 is a flow chart of another web data-based user operation trajectory analysis method according to an embodiment of the present invention. As shown in FIG. 2 , the method further includes:

步骤S10、定期获取预设历史时间范围内所有的用户操作轨迹;Step S10, regularly acquiring all user operation trajectories within a preset historical time range;

为了获取与每种业务类型对应的默认轨迹,需要预先获取预设历史时间范围内,例如当前时刻前半年内,或者1年等,的所有从web数据中得到的用户操作轨迹。In order to obtain the default trajectory corresponding to each business type, it is necessary to obtain in advance all user operation trajectories obtained from web data within a preset historical time range, for example, within half a year before the current moment, or within one year.

步骤S11、对所有的用户操作轨迹采用聚类算法得到至少一个簇。Step S11, using a clustering algorithm for all user operation trajectories to obtain at least one cluster.

采用聚类算法,根据不同维度条件,将所有的用户操作轨迹进行归类汇聚。从而根据对聚类算法的具体设置得到一个簇集,其中至少包括一个簇。Using a clustering algorithm, all user operation trajectories are classified and aggregated according to different dimensional conditions. Thus, a cluster set is obtained according to the specific setting of the clustering algorithm, which includes at least one cluster.

进一步地,所述聚类算法为K-Means聚类算法。Further, the clustering algorithm is K-Means clustering algorithm.

聚类算法有很多种,例如K均值(K-Means)聚类算法,K中心点(K-Medians)聚类算法,均值漂移聚类算法,凝聚层类聚类算法等。在此仅以K-Means聚类算法为例进行举例说明。There are many kinds of clustering algorithms, such as K-Means clustering algorithm, K-Medians clustering algorithm, mean shift clustering algorithm, agglomerated layer clustering algorithm, etc. Here, only the K-Means clustering algorithm is taken as an example for illustration.

K-Means算法通过预先设定的K值及每个类别的初始质心对相似的用户操作轨迹进行划分。并通过划分后的均值迭代优化获得最优的聚类结果。使用误差平方和(Sum ofthe Squared Error,SSE)作为聚类的目标函数,两次运行K均值产生的两个不同的簇集,SSE越小的那个相似度越高。从而在SSE最小时的簇集为最终的结果。其中所述K值可以根据业务类型的数量来进行设定。而且在得到最终的簇集后进行验证,并根据需要进行调整。The K-Means algorithm divides similar user operation trajectories through the preset K value and the initial centroid of each category. And the optimal clustering result is obtained through iterative optimization of the divided mean value. Using Sum of the Squared Error (SSE) as the objective function of clustering, two different clusters generated by running K-means twice, the smaller the SSE, the higher the similarity. Therefore, the clustering at the minimum SSE is the final result. The K value may be set according to the number of service types. And after getting the final clusters, verify and make adjustments as needed.

步骤S12、分别对每个簇中所包含的用户操作轨迹进行分析,得到所述轨迹模型。Step S12 , respectively analyzing the user operation trajectories contained in each cluster to obtain the trajectory model.

通过对每个簇中所包含的用户操作轨迹的分析,从而可以得到在每个簇中包含的主要的用户操作轨迹,或者也可以认为是簇心所对应的用户操作轨迹,将该用户操作轨迹作为对应的业务类型的默认操作轨迹。统计后得到所述轨迹模型。By analyzing the user operation trajectory contained in each cluster, the main user operation trajectory contained in each cluster can be obtained, or it can also be considered as the user operation trajectory corresponding to the cluster center, and the user operation trajectory As the default operation track of the corresponding business type. The trajectory model is obtained after statistics.

所述轨迹模型可以根据实际的需要定期进行统计,例如一个月或者半年等等,将在该段时间范围内得到新的用户操作轨迹加入到历史数据中,从而得到新的轨迹模型。The trajectory model can be counted regularly according to actual needs, such as one month or half a year, etc., and the new user operation trajectory obtained within this period of time is added to the historical data to obtain a new trajectory model.

本发明实施例通过聚类算法对历史时间范围内所有用户操作轨迹分析得到所述轨迹模型,再将实时采集到的用户操作轨迹与对应业务类型的默认轨迹进行比较,若不同,则判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。The embodiment of the present invention uses a clustering algorithm to analyze all user operation trajectories in the historical time range to obtain the trajectory model, and then compares the user operation trajectory collected in real time with the default trajectory of the corresponding service type, and if they are different, then determine the The user operation trajectory is an abnormal trajectory, so that the user operation trajectory can be accurately analyzed more simply and efficiently.

图3为本发明实施例的又一基于web数据的用户操作轨迹分析方法流程图,如图3所示,所述方法还包括:Fig. 3 is another flow chart of a user operation trajectory analysis method based on web data according to an embodiment of the present invention. As shown in Fig. 3, the method further includes:

步骤S20、对每种异常轨迹进行计数。Step S20, counting each abnormal trajectory.

将得到的所有异常轨迹,根据不同的业务类型分别进行统计。从而可以得到每个业务类型产生的异常轨迹的种类,并对每种异常轨迹进行计数。例如在变更套餐的过程中用户的异常轨迹有操作a1、a2、a3、a6、a7、a5,或者操作操作a1、a2、a3、a8、a5,则可以判定对于变更套餐业务存在两种异常轨迹,并在每次接收到异常轨迹时,对相应种类的异常轨迹进行计数。All abnormal trajectories obtained will be counted separately according to different business types. Therefore, the types of abnormal trajectories generated by each service type can be obtained, and each abnormal trajectory can be counted. For example, in the process of changing the package, the abnormal trajectory of the user includes operations a1, a2, a3, a6, a7, a5, or operation a1, a2, a3, a8, a5, then it can be determined that there are two abnormal trajectories for the package change business , and each time an abnormal trajectory is received, the corresponding type of abnormal trajectory is counted.

步骤S21、若所述计数超过预设的计数阈值,则发出预警信息。Step S21 , if the count exceeds a preset count threshold, send out an early warning message.

预先设定计数阈值,若对其中一种异常轨迹的计数超过了所述计数阈值,或者在预设的时间范围内超过了预设的计数阈值,则可以发出对应的预警信息以告知对应的业务类型的默认轨迹可能发生变化或者出现了新的默认轨迹。The counting threshold is preset. If the counting of one of the abnormal trajectories exceeds the counting threshold, or exceeds the preset counting threshold within a preset time range, a corresponding early warning message can be issued to inform the corresponding business The default track for a type may have changed or a new default track may have appeared.

本发明实施例通过对每种业务类型的异常轨迹的统计,若一种异常轨迹的计数超过了预设的计数阈值,则发出预警信息,从而有助于更加简单高效得对用户操作轨迹进行准确分析。In the embodiment of the present invention, through the statistics of the abnormal trajectory of each type of business, if the count of an abnormal trajectory exceeds the preset count threshold, an early warning message will be issued, thereby helping to more simply and efficiently conduct accurate user operation trajectories. analyze.

图4为本发明实施例的用于基于web数据的用户操作轨迹分析装置结构示意图,如图4所示,所述装置包括:流量采集单元10、轨迹分析单元11和交叉识别单元12,其中,FIG. 4 is a schematic structural diagram of a device for analyzing user operation traces based on web data according to an embodiment of the present invention. As shown in FIG. 4 , the device includes: a flow collection unit 10, a trace analysis unit 11, and a cross identification unit 12, wherein,

所述流量采集单元10用于实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;所述轨迹分析单元11用于根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;所述交叉识别单元12用于若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。具体地:The traffic collection unit 10 is used to obtain user operation trajectories in real time, and the user operation trajectories at least include service types; the trajectory analysis unit 11 is used to combine the user operation trajectories with the All default tracks with the same business type are compared; wherein, the track model includes at least one default track corresponding to each business type; the cross identification unit 12 is used for if the user operation track is different from the default track, Then mark the user operation track as an abnormal track. specifically:

所述流量采集单元10通过采集网络中Web数据,可以得到用户在办理业务时的用户操作轨迹,所述用户操作轨迹包括了用户操作的多维数据,具体至少包括业务类型、工号、地址、时间和时长等。The traffic collection unit 10 can obtain the user operation track when the user handles business by collecting Web data in the network. The user operation track includes multi-dimensional data operated by the user, specifically at least including business type, job number, address, time and duration etc.

所述轨迹分析单元11预先获取与每种业务类型对应的默认轨迹,相当于是用户办理该业务时典型的用户操作轨迹。所述默认轨迹的获取方法,可以是根据每种业务类型的分类和在整个业务中心的设置,直接得的至少一条预设的默认轨迹,也可以对实际的历史数据通过预设的算法,通过用户实际使用过程中的用户操作轨迹进行统计来获取轨迹模型,所述轨迹模型中包括有与每种业务类型对应的至少一条默认轨迹。The trajectory analysis unit 11 acquires in advance a default trajectory corresponding to each business type, which is equivalent to a typical user operation trajectory when the user handles the business. The acquisition method of the default trajectory may be to directly obtain at least one preset default trajectory according to the classification of each business type and the setting in the entire business center, or the actual historical data may be obtained through a preset algorithm, through A trajectory model is obtained by performing statistics on user operation trajectories during actual use by the user, and the trajectory model includes at least one default trajectory corresponding to each service type.

经过比对后,若所述用户操作轨迹与默认轨迹相同,则不进行任何后续操作。而若不同,则由所述交叉识别单元12将所述用户操作轨迹标记为异常轨迹。可以进行告警或仅作记录。After comparison, if the user operation trajectory is the same as the default trajectory, no subsequent operation is performed. If they are different, the intersection identification unit 12 marks the user operation track as an abnormal track. Can be alerted or just logged.

本发明实施例提供的装置用于执行上述方法,其功能具体参考上述方法实施例,其具体方法流程在此处不再赘述。The device provided by the embodiment of the present invention is used to execute the above method, and its function refers to the above method embodiment for details, and its specific method flow is not repeated here.

本发明实施例通过对将由流量采集单元10采集到的用户操作轨迹在轨迹分析单元11中与对应业务类型的默认轨迹进行比较,若不同,则由交叉识别单元12判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。In the embodiment of the present invention, the user operation trajectory collected by the traffic collection unit 10 is compared in the trajectory analysis unit 11 with the default trajectory of the corresponding business type, and if they are different, the cross identification unit 12 determines that the user operation trajectory is abnormal Trajectories, so that it is easier and more efficient to accurately analyze user operation trajectories.

图5为本发明实施例的另一用于基于web数据的用户操作轨迹分析装置结构示意图,如图5所示,所述装置包括:流量采集单元10、轨迹分析单元11、交叉识别单元12、数据仓库单元13、关联计算单元14和建模单元15,其中,FIG. 5 is a schematic structural diagram of another device for analyzing user operation traces based on web data according to an embodiment of the present invention. As shown in FIG. Data warehouse unit 13, associated calculation unit 14 and modeling unit 15, wherein,

所述数据仓库单元13用于定期获取预设历史时间范围内所有的用户操作轨迹;所述关联计算单元14用于对所有的用户操作轨迹采用聚类算法得到至少一个簇;所述建模单元15用于分别对每个簇中所包含的用户操作轨迹进行分析,得到所述轨迹模型。The data warehouse unit 13 is used to regularly obtain all user operation trajectories within the preset historical time range; the associated calculation unit 14 is used to obtain at least one cluster by using a clustering algorithm for all user operation trajectories; the modeling unit 15 is used to separately analyze the user operation trajectory included in each cluster to obtain the trajectory model.

为了获取与每种业务类型对应的默认轨迹,需要所述数据仓库单元13预先获取预设历史时间范围内,例如当前时刻前半年内,或者1年等,的所有从web数据中得到的用户操作轨迹。In order to obtain the default trajectory corresponding to each business type, the data warehouse unit 13 needs to obtain in advance all user operation trajectories obtained from web data within the preset historical time range, for example, within half a year before the current moment, or 1 year, etc. .

所述关联计算单元14采用聚类算法,根据不同维度条件,将所述数据仓库单元13中所有的用户操作轨迹进行归类汇聚。从而根据对聚类算法的具体设置得到一个簇集,其中至少包括一个簇。The association calculation unit 14 uses a clustering algorithm to classify and aggregate all user operation trajectories in the data warehouse unit 13 according to different dimensional conditions. Thus, a cluster set is obtained according to the specific setting of the clustering algorithm, which includes at least one cluster.

进一步地,所述聚类算法为K-Means聚类算法。Further, the clustering algorithm is K-Means clustering algorithm.

聚类算法有很多种,例如K均值(K-Means)聚类算法,K中心点(K-Medians)聚类算法,均值漂移聚类算法,凝聚层类聚类算法等。在此仅以K-Means聚类算法为例进行举例说明。There are many kinds of clustering algorithms, such as K-Means clustering algorithm, K-Medians clustering algorithm, mean shift clustering algorithm, agglomerated layer clustering algorithm, etc. Here, only the K-Means clustering algorithm is taken as an example for illustration.

K-Means算法通过预先设定的K值及每个类别的初始质心对相似的用户操作轨迹进行划分。并通过划分后的均值迭代优化获得最优的聚类结果。使用误差平方和(Sum ofthe Squared Error,SSE)作为聚类的目标函数,两次运行K均值产生的两个不同的簇集,SSE越小的那个相似度越高。从而在SSE最小时的簇集为最终的结果。其中所述K值可以根据业务类型的数量来进行设定。而且在得到最终的簇集后进行验证,并根据需要进行调整。The K-Means algorithm divides similar user operation trajectories through the preset K value and the initial centroid of each category. And the optimal clustering result is obtained through iterative optimization of the divided mean value. Using Sum of the Squared Error (SSE) as the objective function of clustering, two different clusters generated by running K-means twice, the smaller the SSE, the higher the similarity. Therefore, the clustering at the minimum SSE is the final result. The K value may be set according to the number of service types. And after getting the final clusters, verify and make adjustments as needed.

所述建模单元15通过对由所述关联计算单元14得到的每个簇中所包含的用户操作轨迹的分析,从而可以得到在每个簇中包含的主要的用户操作轨迹,或者也可以认为是簇心所对应的用户操作轨迹,将该用户操作轨迹作为对应的业务类型的默认操作轨迹。统计后得到所述轨迹模型并发送给所述轨迹分析单元11。The modeling unit 15 can obtain the main user operation trajectories contained in each cluster by analyzing the user operation trajectories contained in each cluster obtained by the association calculation unit 14, or it can also be regarded as is the user operation track corresponding to the cluster center, which is used as the default operation track of the corresponding business type. The trajectory model is obtained after statistics and sent to the trajectory analysis unit 11 .

所述轨迹模型可以根据实际的需要定期进行统计,所述流量采集单元10将在该段时间范围内得到新的用户操作轨迹加入到数据仓库单元13中,从而得到新的轨迹模型。The trajectory model can be regularly counted according to actual needs, and the traffic collection unit 10 will add the new user operation trajectory obtained within this period of time to the data warehouse unit 13, so as to obtain a new trajectory model.

本发明实施例提供的装置用于执行上述方法,其功能具体参考上述方法实施例,其具体方法流程在此处不再赘述。The device provided by the embodiment of the present invention is used to execute the above method, and its function refers to the above method embodiment for details, and its specific method flow is not repeated here.

本发明实施例关联计算单元14通过聚类算法对所述数据仓库单元13中历史时间范围内所有用户操作轨迹分析,由所述建模单元15得到所述轨迹模型,再将同流量采集单元10实时采集到的用户操作轨迹在轨迹分析单元11与对应业务类型的默认轨迹进行比较,若不同,则由交叉识别单元12判定所述用户操作轨迹为异常轨迹,从而能够更加简单高效得对用户操作轨迹进行准确分析。In the embodiment of the present invention, the correlation calculation unit 14 analyzes all user operation trajectories in the historical time range in the data warehouse unit 13 through a clustering algorithm, and obtains the trajectory model by the modeling unit 15, and then uses the traffic collection unit 10 The user operation trajectory collected in real time is compared with the default trajectory of the corresponding business type in the trajectory analysis unit 11. If they are different, the cross identification unit 12 determines that the user operation trajectory is an abnormal trajectory, so that the user operation can be performed more simply and efficiently. trajectory for accurate analysis.

图6为本发明实施例的又一用于基于web数据的用户操作轨迹分析装置结构示意图,如图6所示,所述装置包括:FIG. 6 is a schematic structural diagram of another device for analyzing user operation traces based on web data according to an embodiment of the present invention. As shown in FIG. 6, the device includes:

流量采集单元10、轨迹分析单元11、交叉识别单元12、数据仓库单元13、关联计算单元14、建模单元15和量化单元16,其中,Flow collection unit 10, trajectory analysis unit 11, intersection identification unit 12, data warehouse unit 13, correlation calculation unit 14, modeling unit 15 and quantification unit 16, wherein,

所述量化单元16用于对每种异常轨迹进行计数;所述量化单元16还用于若所述计数超过预设计数阈值,则发出预警信息。具体地:The quantization unit 16 is used for counting each abnormal trajectory; the quantization unit 16 is also used for issuing an early warning message if the count exceeds a preset count threshold. specifically:

所述量化单元16将得到的所有异常轨迹,根据不同的业务类型分别进行统计。从而可以得到每个业务类型产生的异常轨迹的种类,并对每种异常轨迹进行计数。The quantification unit 16 makes statistics on all obtained abnormal trajectories according to different business types. Therefore, the types of abnormal trajectories generated by each service type can be obtained, and each abnormal trajectory can be counted.

所述量化单元16预先设定计数阈值,若对其中一种异常轨迹的计数超过了所述计数阈值,或者在预设的时间范围内超过了预设的计数阈值,则可以发出对应的预警信息以告知对应的业务类型的默认轨迹可能发生变化或者出现了新的默认轨迹。The quantization unit 16 presets a counting threshold, and if the counting of one of the abnormal trajectories exceeds the counting threshold, or exceeds the preset counting threshold within a preset time range, a corresponding warning message can be issued In order to inform that the default track of the corresponding business type may change or a new default track appears.

本发明实施例提供的装置用于执行上述方法,其功能具体参考上述方法实施例,其具体方法流程在此处不再赘述。The device provided by the embodiment of the present invention is used to execute the above method, and its function refers to the above method embodiment for details, and its specific method flow is not repeated here.

本发明实施例通过量化单元16对每种业务类型的异常轨迹的统计,若一种异常轨迹的计数超过了预设的计数阈值,则发出预警信息,从而有助于更加简单高效得对用户操作轨迹进行准确分析。In the embodiment of the present invention, through the statistics of the abnormal trajectory of each business type by the quantization unit 16, if the count of an abnormal trajectory exceeds the preset count threshold, an early warning message will be issued, thereby helping to more simply and efficiently operate the user trajectory for accurate analysis.

图7示例了一种电子设备的实体结构示意图,如图7所示,该服务器可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行如下方法:实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。FIG. 7 illustrates a schematic diagram of a physical structure of an electronic device. As shown in FIG. 7, the server may include: a processor (processor) 810, a communication interface (Communications Interface) 820, a memory (memory) 830, and a communication bus 840, wherein , the processor 810 , the communication interface 820 , and the memory 830 communicate with each other through the communication bus 840 . The processor 810 may call the logic instructions in the memory 830 to perform the following method: acquire user operation trajectories in real time, the user operation trajectories at least including business types; Compare with all default trajectories with the same business type; wherein, the trajectory model includes at least one default trajectory corresponding to each business type; if the user operation trajectory is different from the default trajectory, then the user operation trajectory Marked as abnormal trajectory.

进一步地,本发明实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如包括:实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。Furthermore, the embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by the computer During execution, the computer can execute the methods provided by the above method embodiments, for example, including: acquiring user operation trajectories in real time, the user operation trajectories at least including business types; The operation trajectory is compared with all default trajectories with the same business type; wherein, the trajectory model includes at least one default trajectory corresponding to each business type; if the user operation trajectory is different from the default trajectory, then the user Operational trajectories are marked as abnormal trajectories.

进一步地,本发明实施例提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述各方法实施例所提供的方法,例如包括:实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹。Furthermore, an embodiment of the present invention provides a non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the methods provided by the above-mentioned method embodiments. The method, for example, includes: acquiring user operation trajectories in real time, the user operation trajectories at least including business types; comparing the user operation trajectories with all default trajectories of the same business type according to a trajectory model obtained in advance through a clustering algorithm ; Wherein, the trajectory model includes at least one default trajectory corresponding to each business type; if the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory.

本领域普通技术人员可以理解:此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random AccessMemory)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that: in addition, the above logic instructions in the memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

以上所描述的电子设备等实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The above-described embodiments such as electronic equipment are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, It can be located in one place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the implementations, those skilled in the art can clearly understand that each implementation can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims (8)

1.一种基于web数据的用户操作轨迹分析方法,其特征在于,包括:1. A user operation trajectory analysis method based on web data, characterized in that, comprising: 实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;Obtaining user operation traces in real time, where the user operation traces at least include business types; 根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;According to the trajectory model obtained in advance through the clustering algorithm, the user operation trajectory is compared with all default trajectories with the same business type; wherein the trajectory model includes at least one default trajectory corresponding to each business type; 若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹;If the user operation trajectory is different from the default trajectory, the user operation trajectory is marked as an abnormal trajectory; 采用聚类算法,根据不同维度条件,将所有的用户操作轨迹进行归类汇聚,根据对聚类算法的具体设置得到一个簇集,其中至少包括一个簇;Using a clustering algorithm, according to different dimensional conditions, all user operation trajectories are classified and aggregated, and a cluster is obtained according to the specific settings of the clustering algorithm, including at least one cluster; 所述聚类算法为K-Means聚类算法,K-Means算法通过预先设定的K值及每个类别的初始质心对相似的用户操作轨迹进行划分,并通过划分后的均值迭代优化获得最优的聚类结果;The clustering algorithm is the K-Means clustering algorithm. The K-Means algorithm divides similar user operation trajectories through the preset K value and the initial centroid of each category, and obtains the optimal value through iterative optimization of the divided mean value. Excellent clustering results; 使用误差平方和作为聚类的目标函数,两次运行K均值产生的两个不同的簇集,误差平方和越小相似度越高,从而在误差平方和最小时的簇集为最终的结果;其中所述K值根据业务类型的数量来进行设定,在得到最终的簇集后进行验证,并根据需要进行调整;Using the sum of squared errors as the objective function of clustering, two different clusters generated by running K-means twice, the smaller the sum of squared errors, the higher the similarity, so the cluster when the sum of squared errors is the smallest is the final result; Wherein the K value is set according to the number of business types, verified after the final cluster is obtained, and adjusted as required; 预先设定计数阈值,若对其中一种异常轨迹的计数超过了所述计数阈值,或者在预设的时间范围内超过了预设的计数阈值,则发出对应的预警信息以告知对应的业务类型的默认轨迹发生变化或者出现了新的默认轨迹。The counting threshold is preset, and if the counting of one of the abnormal trajectories exceeds the counting threshold, or exceeds the preset counting threshold within the preset time range, a corresponding early warning message is issued to notify the corresponding business type The default track changes or a new default track appears. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 定期获取预设历史时间范围内所有的用户操作轨迹;Regularly obtain all user operation trajectories within the preset historical time range; 对所有的用户操作轨迹采用聚类算法得到至少一个簇;Use a clustering algorithm to obtain at least one cluster for all user operation trajectories; 分别对每个簇中所包含的用户操作轨迹进行分析,得到所述轨迹模型。The user operation trajectories included in each cluster are respectively analyzed to obtain the trajectory model. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, wherein the method further comprises: 对每种异常轨迹进行计数;Count each abnormal trajectory; 若所述计数超过预设计数阈值,则发出预警信息。If the count exceeds a preset count threshold, an early warning message is issued. 4.一种用于基于web数据的用户操作轨迹分析装置,其特征在于,包括:4. A user operation trajectory analysis device based on web data, characterized in that, comprising: 流量采集单元,用于实时获取用户操作轨迹,所述用户操作轨迹至少包括业务类型;A traffic collection unit, configured to acquire user operation traces in real time, where the user operation traces at least include service types; 轨迹分析单元,用于根据预先通过聚类算法获取的轨迹模型,将所述用户操作轨迹与具有相同业务类型的所有默认轨迹进行比对;其中,所述轨迹模型包括与每种业务类型对应的至少一条默认轨迹;A trajectory analysis unit, configured to compare the user operation trajectory with all default trajectories of the same business type according to a trajectory model obtained in advance through a clustering algorithm; wherein the trajectory model includes at least one default track; 交叉识别单元,用于若所述用户操作轨迹与默认轨迹不同,则将所述用户操作轨迹标记为异常轨迹;An intersection identification unit, configured to mark the user operation trajectory as an abnormal trajectory if the user operation trajectory is different from the default trajectory; 采用聚类算法,根据不同维度条件,将所有的用户操作轨迹进行归类汇聚,根据对聚类算法的具体设置得到一个簇集,其中至少包括一个簇;Using a clustering algorithm, according to different dimensional conditions, all user operation trajectories are classified and aggregated, and a cluster is obtained according to the specific settings of the clustering algorithm, including at least one cluster; 所述聚类算法为K-Means聚类算法,K-Means算法通过预先设定的K值及每个类别的初始质心对相似的用户操作轨迹进行划分,并通过划分后的均值迭代优化获得最优的聚类结果;The clustering algorithm is the K-Means clustering algorithm. The K-Means algorithm divides similar user operation trajectories through the preset K value and the initial centroid of each category, and obtains the optimal value through iterative optimization of the divided mean value. Excellent clustering results; 使用误差平方和作为聚类的目标函数,两次运行K均值产生的两个不同的簇集,误差平方和越小相似度越高,从而在误差平方和最小时的簇集为最终的结果;其中所述K值根据业务类型的数量来进行设定,在得到最终的簇集后进行验证,并根据需要进行调整;Using the sum of squared errors as the objective function of clustering, two different clusters generated by running K-means twice, the smaller the sum of squared errors, the higher the similarity, so the cluster when the sum of squared errors is the smallest is the final result; Wherein the K value is set according to the number of business types, verified after the final cluster is obtained, and adjusted as required; 预先设定计数阈值,若对其中一种异常轨迹的计数超过了所述计数阈值,或者在预设的时间范围内超过了预设的计数阈值,则发出对应的预警信息以告知对应的业务类型的默认轨迹发生变化或者出现了新的默认轨迹。The counting threshold is preset, and if the counting of one of the abnormal trajectories exceeds the counting threshold, or exceeds the preset counting threshold within the preset time range, a corresponding early warning message is issued to notify the corresponding business type The default track changes or a new default track appears. 5.根据权利要求4所述的装置,其特征在于,所述装置还包括:5. The device according to claim 4, further comprising: 数据仓库单元,用于定期获取预设历史时间范围内所有的用户操作轨迹;The data warehouse unit is used to regularly obtain all user operation tracks within the preset historical time range; 关联计算单元,用于对所有的用户操作轨迹采用聚类算法得到至少一个簇;An associative computing unit, configured to obtain at least one cluster by using a clustering algorithm for all user operation trajectories; 建模单元,用于分别对每个簇中所包含的用户操作轨迹进行分析,得到所述轨迹模型。The modeling unit is configured to respectively analyze the user operation trajectories contained in each cluster to obtain the trajectory model. 6.根据权利要求4所述的装置,其特征在于,所述装置还包括:6. The device according to claim 4, further comprising: 量化单元,用于对每种异常轨迹进行计数;A quantization unit is used to count each abnormal trajectory; 所述量化单元,还用于若所述计数超过预设计数阈值,则发出预警信息。The quantization unit is further configured to issue an early warning message if the count exceeds a preset count threshold. 7.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至3任一项所述用户操作轨迹分析方法的步骤。7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements any of claims 1 to 3 when executing the program. The steps of the user operation trajectory analysis method described in the item. 8.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至3任一项所述用户操作轨迹分析方法的步骤。8. A non-transitory computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the user operation trajectory analysis method according to any one of claims 1 to 3 is realized step.
CN201811453609.9A 2018-11-30 2018-11-30 User operation track analysis method and device based on web data Active CN111258874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811453609.9A CN111258874B (en) 2018-11-30 2018-11-30 User operation track analysis method and device based on web data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811453609.9A CN111258874B (en) 2018-11-30 2018-11-30 User operation track analysis method and device based on web data

Publications (2)

Publication Number Publication Date
CN111258874A CN111258874A (en) 2020-06-09
CN111258874B true CN111258874B (en) 2023-09-05

Family

ID=70948489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811453609.9A Active CN111258874B (en) 2018-11-30 2018-11-30 User operation track analysis method and device based on web data

Country Status (1)

Country Link
CN (1) CN111258874B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782876A (en) * 2020-06-30 2020-10-16 杭州海康机器人技术有限公司 Data processing method, device and system and storage medium
CN112667277B (en) * 2020-12-25 2023-07-25 中国平安人寿保险股份有限公司 Information pushing method and device based on small program and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102946317A (en) * 2012-08-07 2013-02-27 甘利俭 User behavior analysis system
KR20140073345A (en) * 2012-12-06 2014-06-16 한국과학기술원 Method for task list recommanation associated with user interation and mobile device using the same
CN107306252A (en) * 2016-04-21 2017-10-31 中国移动通信集团河北有限公司 A kind of data analysing method and system
CN107426177A (en) * 2017-06-13 2017-12-01 努比亚技术有限公司 A kind of user behavior clustering method and terminal, computer-readable recording medium
CN108512806A (en) * 2017-02-24 2018-09-07 中国移动通信集团公司 A kind of operation behavior analysis method and server based on virtual environment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8321251B2 (en) * 2010-03-04 2012-11-27 Accenture Global Services Limited Evolutionary process system
US8855361B2 (en) * 2010-12-30 2014-10-07 Pelco, Inc. Scene activity analysis using statistical and semantic features learnt from object trajectory data
US8660368B2 (en) * 2011-03-16 2014-02-25 International Business Machines Corporation Anomalous pattern discovery
US10423892B2 (en) * 2016-04-05 2019-09-24 Omni Ai, Inc. Trajectory cluster model for learning trajectory patterns in video data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102946317A (en) * 2012-08-07 2013-02-27 甘利俭 User behavior analysis system
KR20140073345A (en) * 2012-12-06 2014-06-16 한국과학기술원 Method for task list recommanation associated with user interation and mobile device using the same
CN107306252A (en) * 2016-04-21 2017-10-31 中国移动通信集团河北有限公司 A kind of data analysing method and system
CN108512806A (en) * 2017-02-24 2018-09-07 中国移动通信集团公司 A kind of operation behavior analysis method and server based on virtual environment
CN107426177A (en) * 2017-06-13 2017-12-01 努比亚技术有限公司 A kind of user behavior clustering method and terminal, computer-readable recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
成静 ; 朱怡安 ; 张涛 ; 杨艳丽 ; .一种基于操作轨迹模型的移动应用易用性评估方法.西北工业大学学报.2016,(04),全文. *

Also Published As

Publication number Publication date
CN111258874A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
CN110351150B (en) Fault source determination method and device, electronic equipment and readable storage medium
CN109961204B (en) A business quality analysis method and system under a microservice architecture
US11138058B2 (en) Hierarchical fault determination in an application performance management system
WO2021120186A1 (en) Distributed product defect analysis system and method, and computer-readable storage medium
CN111181799B (en) Network traffic monitoring method and equipment
US10942801B2 (en) Application performance management system with collective learning
CN112181767A (en) Software system abnormality determination method, device and storage medium
US11138060B2 (en) Application performance management system with dynamic discovery and extension
CN109325193A (en) WAF normal traffic modeling method and device based on machine learning
CN111294233A (en) Network alarm statistical analysis method, system and computer-readable storage medium
CN113505048A (en) Unified monitoring platform based on application system portrait and implementation method
CN111258874B (en) User operation track analysis method and device based on web data
KR102070913B1 (en) Method and apparatus for processing wafer data
CN112306700A (en) Abnormal RPC request diagnosis method and device
CN109542737A (en) Platform alert processing method, device, electronic device and storage medium
CN111669281A (en) Alarm analysis method, device, equipment and storage medium
CN111274084A (en) Fault diagnosis method, device, equipment and computer readable storage medium
CN116415206A (en) Operator multiple data fusion method, system, electronic equipment and computer storage medium
US20160366033A1 (en) Compacted messaging for application performance management system
CN111737371B (en) Data flow detection classification method and device capable of dynamically predicting
US10305983B2 (en) Computer device for distributed processing
CN101515864B (en) Alarm information allocation system and allocation method thereof
CN108334524A (en) A kind of storm daily records error analysis methodology and device
CN112541447A (en) Machine model updating method, device, medium and equipment
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant