CN112667709B

CN112667709B - Campus card leasing behavior detection method and system based on Spark

Info

Publication number: CN112667709B
Application number: CN202011553092.8A
Authority: CN
Inventors: 于磊磊; 李永在; 乔禹
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2022-05-03
Anticipated expiration: 2040-12-24
Also published as: CN112667709A

Abstract

The invention discloses a campus card leasing behavior detection method and system based on Spark, which are used for acquiring use data of a user on a campus card and taking the acquired data as data to be detected; acquiring manually screened usage data of users marked as leases on campus cards, and taking the acquired data as calibration data; converting the data to be detected into a behavior data set to be detected, and converting the calibration data into a calibration behavior data set; carrying out quantitative processing on the category characteristics in the behavior data set to be detected and the calibration behavior data set, and further carrying out standardized processing on all the characteristics; calculating the weight of each characteristic in the calibration behavior data set in parallel by using Spark; parallel weighting and recalculating distances between the behavior data to be detected and all data in the calibration behavior data set; and sequencing the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, and selecting the first K calibration behavior data to perform Gaussian weight weighted voting to obtain the category of the behavior data to be detected.

Description

Campus card leasing behavior detection method and system based on Spark

Technical Field

The application relates to the technical field of abnormal behavior data detection, in particular to a campus card leasing behavior detection method and system based on Spark.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

In the existing campus card management, a campus card leasing behavior exists, in order to find and stop the behavior in time, a campus card management department of a school needs to set a campus card leasing behavior detection method, but in the existing campus card leasing detection, labels are screened by means of manual experience, so that the phenomena of false detection and missed detection are easy to occur, the phenomenon of disordered use of the campus card is caused frequently, and the rights and interests of normal users of the campus card are influenced.

Disclosure of Invention

In order to overcome the defects of the prior art, the application provides a campus card leasing behavior detection method and system based on Spark;

in a first aspect, the application provides a campus card leasing behavior detection method based on Spark;

a campus card leasing behavior detection method based on Spark comprises the following steps:

acquiring use data of a campus card by a user, and taking the acquired data as data to be detected;

acquiring manually screened use data of users marked as leases on the campus card, and taking the acquired data as calibration data; converting the data to be detected into a behavior data set to be detected, and converting the calibration data into a calibration behavior data set;

respectively carrying out quantitative processing on category characteristics in the behavior data set to be detected and the calibration behavior data set; respectively carrying out standardized processing on all characteristics in the behavior data set to be detected and the calibration behavior data set;

a Spark engine is adopted to calculate the weight of each characteristic in the calibration behavior data set in parallel;

adopting a Spark engine to calculate the distance between the behavior data to be detected and all data in the calibration behavior data set in a parallel weighting manner;

and sequencing the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, and selecting the first K calibration behavior data to perform Gaussian weight weighted voting to obtain the category of the behavior data to be detected.

In a second aspect, the application provides a campus card leasing behavior detection system based on Spark;

campus card lease behavior detection system based on Spark includes:

a data acquisition module configured to: acquiring use data of a campus card by a user, and taking the acquired data as data to be detected; acquiring manually screened use data of users marked as leases on the campus card, and taking the acquired data as calibration data; converting the data to be detected into a behavior data set to be detected, and converting the calibration data into a calibration behavior data set;

a data pre-processing module configured to: respectively carrying out quantitative processing on category characteristics in the behavior data set to be detected and the calibration behavior data set; respectively carrying out standardized processing on all characteristics in the behavior data set to be detected and the calibration behavior data set;

a weight calculation module configured to: calculating the weight of each characteristic in the calibration behavior data set in parallel by adopting a Spark engine;

a distance calculation module configured to: adopting a Spark engine to calculate the distance between the behavior data to be detected and all data in the calibration behavior data set in a parallel weighting manner;

a voting module configured to: and sequencing the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, and selecting the first K calibration behavior data to perform Gaussian weight weighted voting to obtain the category of the behavior data to be detected.

In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.

In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.

Compared with the prior art, the beneficial effects of this application are:

a novel quick and efficient campus card leasing behavior detection method and system based on Spark are provided, and a batch analysis mining mode of data is adopted to replace the existing individual analysis screening mode which is mainly determined by experience judgment and evidence, so that the efficiency and the accuracy of leasing behavior detection are remarkably improved, non-explicit leasing behaviors can be effectively detected, and the comprehensive management of a campus card system can be effectively assisted.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of the method of the first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments of the invention may be combined with each other without conflict.

Example one

The embodiment provides a campus card leasing behavior detection method based on Spark;

s101: acquiring use data of a campus card by a user, and taking the acquired data as data to be detected;

s102: respectively carrying out quantitative processing on category characteristics in the behavior data set to be detected and the calibration behavior data set; respectively carrying out standardized processing on all characteristics in the behavior data set to be detected and the calibration behavior data set;

s103: calculating the weight of each characteristic in the calibration behavior data set in parallel by adopting a Spark engine;

s104: adopting a Spark engine to calculate the distance between the behavior data to be detected and all data in the calibration behavior data set in a parallel weighting manner;

s105: and sequencing the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, and selecting the first K calibration behavior data to perform Gaussian weight weighted voting to obtain the category of the behavior data to be detected.

As one or more embodiments, the S101: converting the data to be detected into a behavior data set to be detected; the method comprises the following specific steps:

data to be detected includes: account number, school number, name, gender, college, identity type, transaction amount, transaction merchant, and transaction time;

performing feature extraction on data to be detected to obtain features of the data to be detected; the data characteristics to be detected comprise: gender, identity, whether graduation class exists, total consumption amount, total consumption times, catering consumption amount, bathing consumption proportion, fitness consumption proportion, whether learning related technology exists or not and whether medical related records exist or not;

and storing the characteristics of the data to be detected according to the user number to obtain a behavior data set to be detected.

As one or more embodiments, the S101: converting the calibration data into a calibration behavior data set; the method comprises the following specific steps:

calibration data, comprising: an account number, school number, name, gender, college, identity type, transaction amount, transaction merchant, transaction time, and whether a lease activity is present;

extracting the characteristics of the calibration data to obtain the characteristics of the calibration data; the calibration data characteristics include: the label of sex, identity, whether graduation class, total consumption amount, total consumption times, catering consumption amount, bath consumption ratio, fitness consumption ratio, whether learning related technology exists, whether medical related record exists and whether lease behavior exists;

and storing the calibration data characteristics according to the numbers to obtain a calibration behavior data set.

The data to be detected is converted into a behavior data set to be detected, the calibration data is converted into a calibration behavior data set, the conversion is realized through a consumption behavior data model, the consumption behavior data model is designed based on data statistics and experience judgment, a huge amount of consumption running water without significant behavior characteristics is converted into consumption behavior data with moderate quantity and significant behavior characteristics through data aggregation, consumption running water records of each campus card every week are merged into one consumption behavior data, and 6 types of 11 characteristics shown in a table 1 are defined.

Table 1 behavioral characteristics definition table

Further, the consumption behavior data model, the identity category, mainly characterizes whether the card requirement for the person is rigid, includes: sex, identity factor and whether graduation shift is present; the identity factor comprises: this department, research students, doctor students, teaching employees, alumni and temporary personnel; the overall consumption category represents whether the consumption behavior is continuously stable or not, and comprises the following steps: total amount consumed and total number of times consumed; the category of living food represents the characteristics of a typical school student population, including: the catering consumption amount, the bathing consumption amount and the bathing consumption proportion; the exercise and fitness category characterization and analysis rental card is used for school exercise and fitness behaviors and comprises the following steps: body-building consumption ratio; whether the study work classification represents the record with personal academic work behaviors such as self-service printing, book borrowing and the like comprises the following steps: with or without learning related records; the medical care category represents whether a hospital medical record exists or not, and comprises the following steps: with or without medical related records.

As one or more embodiments, the S102: respectively carrying out quantitative processing on category characteristics in the behavior data set to be detected and the calibration behavior data set; respectively carrying out standardized processing on all characteristics in the behavior data set to be detected and the calibration behavior data set; the method comprises the following specific steps:

respectively carrying out characteristic quantization processing on the categories in the behavior dataset to be detected and the calibration behavior dataset by adopting a one-hot (one-hot) coding quantization method;

and respectively carrying out standardization processing on all characteristics in the behavior data set to be detected and the calibration behavior data set by adopting a Z-score standardization method.

The method adopts a one-hot (one-hot) coding quantization method, is a common method for class characteristic quantization, uses an N-bit state register to code N classes, maps a certain point of the class characteristic to an Euclidean space, and solves the distance rationality problem of the class characteristic.

Further, the one-hot (one-hot) coding quantization method is adopted, and the one-hot coding of the category features in table 1 is shown in table 2. Meanwhile, the new code expands the features, and the number of the features is expanded from 11 to 20.

Table 2 behavior feature definition table

The Z-score standardization method is a common method for carrying out standardization processing on characteristic data, so that characteristic values are in the same order of magnitude, and all characteristics are processed into new data distribution with the average value of 0 and the standard deviation of 1.

Wherein, mu and sigma are respectively the mean value and standard deviation of the characteristic data, and x' is the normalized new characteristic data.

As one or more embodiments, the S103: calculating the weight of each characteristic in the calibration behavior data set in parallel by adopting a Spark engine; the method comprises the following specific steps:

a Driver of the Spark calculation engine stores the calibration behavior data set on a distributed file system (HDFS), and then RDD is carried out to convert the calibration behavior data set into an elastic distributed data set;

each Work node Work is divided into a plurality of actuators Executor according to resources, an improved Relief method is adopted in parallel on each actuator Executor, weight calculation is carried out on each feature in a calibration behavior data set, and a feature weight value on the current actuator Executor is obtained;

and the Driver averages the characteristic weights obtained from the executors and sets the average as the characteristic weight value.

The Relief method in the prior art is an existing method for performing weight calculation on features; the method comprises the following steps:

randomly extracting a sample R from a training sample set each time, finding out a nearest neighbor sample H of the R from a sample set of the same class, and finding out a nearest neighbor sample M from a sample set of different classes of the R;

when updating the weight, if the distance between R and H on a certain feature is smaller than the distance between R and M, the weight of the feature is increased, otherwise, the weight of the feature is decreased, the above process is repeated M times, and finally the weight of each feature is obtained.

Further, the improved Relief method comprises:

randomly extracting a sample R from a segmentation node sample set of the K-D tree each time;

based on a multi-class nearest neighbor set fast acquisition algorithm of the K-D tree, finding out a nearest neighbor sample H of R from a segmented node sample set of the K-D tree of the same class;

based on a multi-class nearest neighbor set fast acquisition algorithm of the K-D tree, finding out nearest neighbor samples M from segmentation node sample sets of the K-D trees of different classes of the R;

when updating the weight, if the distance between R and H on a certain feature is smaller than the distance between R and M, increasing the weight of the feature, otherwise, reducing the weight of the feature, repeating the above process for M times, and finally obtaining the weight of each feature;

and optimizing the obtained weight of each feature.

It should be understood that the extraction range of the sample R in the Relief algorithm is changed from the training sample set to the segmentation node sample set of the K-D tree, so that the extraction distribution of R is more reasonable; and simultaneously, searching nearest neighbor samples H and M of R and similar and dissimilar samples by relying on a segmentation node sample set of a K-D tree.

It should be understood that the K-D tree, which is a fast-indexing binary tree data structure, is a partition of the K-dimensional space, and has the advantage of fast indexing data.

It should be understood that the multi-class nearest neighbor set fast acquisition algorithm based on the K-D tree optimizes the space-time overhead of the acquisition of the nearest neighbor sample set;

further, the multi-class nearest neighbor set fast acquiring algorithm based on the K-D tree includes:

first, establishing an input variable R: behavior data to be detected; K-DTree: a K-D tree; c: a category set; l: the number of nearest neighbors; d: a backtracking threshold; establishing an output variable S _ C_i: r belongs to the class C_iThe nearest neighbor sample of (1);

secondly, taking a K-DTree segmentation sample node set, and adding the set S _ DN; then randomly extracting a sample node R from the S _ DN^*Go back up layer by layer along the direction of the father pathTracing; until a parent sample node R is found^**Satisfy the number of nodes in the subtree rooted at the parent sample node is greater than or equal to c.count x l (i.e. the product of the number of classes and the number of nearest neighbors); r is taken as R^**；

Third, for each class C_iSelecting ones of the left and right subtrees of R as belonging to class C_iSample node joining S _ C_i(ii) a And continuously backtracking upwards along the R to retrieve the data belonging to the category C_iAnd the nodes closer to the node are replaced and added into S _ C_iUntil one of the following conditions is satisfied: (1) reach the ROOT node ROOT; (2) backtracking reaches a backtracking threshold D; (3) s _ C_iThe set is no longer changed;

fourthly, the algorithm is ended, and each S _ C is returned_iAnd (4) collecting.

Further, the optimizing the obtained weight of each feature includes:

the weight calculation mode of the feature A in the Relief algorithm is as follows:

w′(A)＝w(A)-Δw_N(A)+Δw_P(A)

wherein w (A) is the weight of the feature A before updating, and w' (A) is the weight of the feature A after updating;

Δw_N(A) for negative weight increments:

Δw_P(A) for forward weight increments:

wherein diff (A, R)₁,R₂) Comprises the following steps:

because w (a) may have negative numbers and cannot be adapted to the weighted mahalanobis distance calculation, the non-negative optimization process is performed on w (a) after m rounds of calculation are completed, as follows:

w(A)＝z+w(A)+ε

wherein z ═ min (w (X): X ∈ S-

Wherein S is a set of features, z is an absolute value of a minimum value of feature weight generated by calculation, and epsilon is weight offset compensation, and a constant is adopted.

At the same time, to avoid that a certain value in the set is too large, resulting in other values being indistinct, diff (A, R)₁,R₂) Optimization was modified to a z-score normalized model:

as one or more embodiments, the S104: adopting a Spark engine to calculate the distance between the behavior data to be detected and all data in the calibration behavior data set in a parallel weighting manner; the method comprises the following specific steps:

a plurality of executors of each Work node Work calculate the weighted Mahalanobis distance between the behavior data to be detected and all the data in the calibration behavior data set in parallel;

the mahalanobis distance model is a distance model between vectors, and for the vector corresponding to the sample X, Y

The mahalanobis distance is:

wherein S is

And

the covariance matrix of (2).

Further onThe detection method adopts a weighted Mahalanobis distance model, and for improving the Mahalanobis distance model, firstly, the characteristic weight vector of the calibration behavior data set is calculated

The weighted mahalanobis distance is then calculated.

As one or more embodiments, the S105: sorting the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, selecting the first K calibration behavior data to perform Gaussian weight weighted voting, and obtaining the category of the behavior data to be detected; the method comprises the following specific steps:

constructing a class-free nearest neighbor set rapid acquisition algorithm based on the K-D tree of S103, and rapidly acquiring the behavior data to be detected and the first K nearest neighbor data of the calibration behavior data set;

weighting the K calibration behavior data one by adopting a Gaussian function according to the distance between the behavior data to be detected and the K calibration behavior data;

voting is carried out according to the weight and the category mark of the K pieces of calibration behavior data.

Further, the K-D tree-based class-free nearest neighbor set rapid acquisition algorithm; further comprising:

firstly, establishing K-DTree: a K-D tree; u: data to be detected; k: the number of nearest neighbors; d: a backtracking threshold; establishing an output variable Node [ h ] as a neighbor Node set;

secondly, finding the nearest neighbor point N of the U on the K-D Tree through binary Tree search; if there is a sample node N closer than N in the left and right subtree spaces of N^*Then stop the search and turn N^*Add Node [ h ]]Otherwise, N is added into Node [ h]；

Thirdly, backtracking upwards, setting N as N father sample nodes, and repeating the second step h until the backtracking depth reaches a given threshold value D;

and fourthly, outputting Node [ h ], and finishing the algorithm.

Weighting by adopting a Gaussian function, namely weighting the calibration behavior data by adopting the Gaussian function, and for the ith calibration behavior data N_iIt is calculated as follows:

wherein d is_iAs neighbor samples N_iThe distance from the sample to be classified is calculated by considering voting weight, and a is set to be 1, b is set to be 0, and c is set to be adjustable parameter.

Voting, namely voting the category of the behavior data to be detected through weighted category voting, wherein the voting is calculated as follows:

wherein the content of the first and second substances,

wherein K is the number of neighbors, L is the number of categories, C_jIs the jth class, f_ijAnd identifying the category attribution.

Example two

The embodiment provides a campus card leasing behavior detection system based on Spark;

campus card lease behavior detection system based on Spark includes:

It should be noted here that the data acquisition module, the data preprocessing module, the weight calculation module, the distance calculation module, and the voting module correspond to steps S101 to S105 in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.

In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.

EXAMPLE III

The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Example four

The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A campus card leasing behavior detection method based on Spark is characterized by comprising the following steps:

calculating the weight of each characteristic in the calibration behavior data set in parallel by adopting a Spark engine;

sorting the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, selecting the first K calibration behavior data to perform Gaussian weight weighted voting, and obtaining the category of the behavior data to be detected;

sorting the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, selecting the first K calibration behavior data to perform Gaussian weight weighted voting, and obtaining the category of the behavior data to be detected; the method comprises the following specific steps:

rapidly acquiring the behavior data to be detected and the first K nearest neighbor data of the calibration behavior data set based on a class-free nearest neighbor set rapid acquisition algorithm of a K-D tree;

voting according to the weight and the category mark of the K pieces of calibration behavior data;

the class-free nearest neighbor set rapid acquisition algorithm based on the K-D tree; further comprising:

fourthly, outputting Node [ h ], and finishing the algorithm;

the weighting is carried out by adopting a Gaussian function, namely the weighting is carried out on the calibration behavior data by adopting the Gaussian function, and the ith calibration behavior data N is_iIt is calculated as follows:

wherein d is_iAs neighbor samples N_iSetting a to be 1, b to be 0 and c to be adjustable parameters by considering the calculation of voting weight;

wherein the content of the first and second substances,

2. The method for detecting campus card leasing behavior according to claim 1, wherein,

converting the data to be detected into a behavior data set to be detected; the method comprises the following specific steps:

3. The method for detecting campus card leasing behavior according to claim 1, wherein,

converting the calibration data into a calibration behavior data set; the method comprises the following specific steps:

4. The method for detecting campus card leasing behavior according to claim 1, wherein,

calculating the weight of each characteristic in the calibration behavior data set in parallel by adopting a Spark engine; the method comprises the following specific steps:

5. The method for detecting campus card leasing behavior according to claim 4, wherein,

the improved Relief method, comprising:

the method comprises the following steps: randomly extracting a sample R from a segmentation node sample set of the K-D tree each time;

step two: based on a multi-class nearest neighbor set fast acquisition algorithm of the K-D tree, finding out a nearest neighbor sample H of R from a segmented node sample set of the K-D tree of the same class;

step three: based on a multi-class nearest neighbor set fast acquisition algorithm of the K-D tree, finding out nearest neighbor samples M from segmentation node sample sets of the K-D trees of different classes of the R;

step four: when the weight is updated, if the distance between R and H on a certain feature is smaller than the distance between R and M, the weight of the feature is increased, otherwise, the weight of the feature is reduced,

repeating the processes from the first step to the fourth step for m times, and finally obtaining the weight of each feature;

and optimizing the obtained weight of each feature.

6. The method for detecting campus card leasing behavior according to claim 5, wherein,

the multi-class nearest neighbor set fast acquisition algorithm based on the K-D tree comprises the following steps:

secondly, taking a K-DTree segmentation sample node set, and adding the set S _ DN; then randomly extracting a sample node R from the S _ DN^*Tracing back upwards layer by layer along the father path direction; until a parent sample node R is found^**The number of nodes in a subtree taking the father sample node as a root is larger than or equal to C.count x l; r is taken as R^**；

7. Campus card lease behavior detection system based on Spark, characterized by includes:

a voting module configured to: sorting the data according to the distance between the behavior data to be detected and the calibration behavior data from small to large, selecting the first K calibration behavior data to perform Gaussian weight weighted voting, and obtaining the category of the behavior data to be detected;

fourthly, outputting Node [ h ], and finishing the algorithm;

wherein the content of the first and second substances,

8. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-6.

9. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.