CN112819491A

CN112819491A - Method and device for processing conversion data, electronic equipment and storage medium

Info

Publication number: CN112819491A
Application number: CN201911120892.8A
Authority: CN
Inventors: 张晓雨; 朱建新; 唐潜; 郭玲; 杨雷; 秦首科
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2021-05-18
Anticipated expiration: 2039-11-15
Also published as: CN112819491B

Abstract

The application discloses a method and a device for processing conversion data, electronic equipment and a storage medium, and relates to the field of data processing, in particular to a conversion data processing technology in a target conversion bid. The specific implementation scheme is as follows: normalizing various types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises a time-based index value sequence; judging whether the index value sequence has abnormal data or not according to the time weight; and if the abnormal data exist, deleting the abnormal data from the index value sequence. The normalization processing can convert various types of conversion data into normalization data with the same data structure, and abnormal data detection is performed on an index value sequence contained in the normalization data based on the time weight, so that the accuracy of abnormal data detection can be improved.

Description

Method and device for processing conversion data, electronic equipment and storage medium

Technical Field

The application relates to a data processing technology, in particular to an internet advertisement conversion data processing technology.

Background

An optimization cost per click (oCPC) is used in internet advertising to calculate the rate of return. In the implementation process of the target conversion bidding function, statistics and abnormal data detection need to be carried out on conversion data. But when processing heterogeneous data sources, the abnormal data detection accuracy is poor.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing conversion data, an electronic device and a storage medium, which can improve the accuracy of abnormal data detection when processing heterogeneous data sources.

The embodiment of the application provides a method for processing conversion data, which comprises the following steps:

normalizing various types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises a time-based index value sequence;

judging whether the index value sequence has abnormal data or not according to the time weight;

and if the abnormal data exist, deleting the abnormal data from the index value sequence.

According to the application embodiment, various types of conversion data can be normalized according to the index parameters to obtain normalized data, and the normalized data comprises a time-based index value sequence; judging whether the index value sequence has abnormal data or not according to the time weight; and if the abnormal data exist, deleting the abnormal data from the index value sequence. The normalization processing can convert various types of conversion data into normalization data with the same data structure, and abnormal data detection is performed on an index value sequence contained in the normalization data based on the time weight, so that the accuracy of abnormal data detection can be improved.

In an embodiment of the above application, the normalizing the multiple types of transformation data according to the index parameter to obtain normalized data includes:

acquiring a user identifier, a conversion type and an acquisition type of conversion data;

and normalizing the various types of conversion data according to the user identification, the conversion type and the acquisition type to obtain normalized data.

In the above application embodiment, a plurality of types of conversion data can be classified according to the user identification, the conversion type, and the acquisition type. And normalizing the conversion data according to the user identification, the conversion type and the acquisition mode of the conversion data to which the conversion data belongs, so as to realize the normalization of the heterogeneous data source. By recording the user identification, the conversion type and the acquisition type, the conversion data can be marked more accurately.

In an embodiment of the above application, the normalizing the conversion data of the plurality of types according to the user identifier, the conversion type, and the acquisition type to obtain normalized data includes:

configuring a data unit for the user identification;

the data unit comprises an incidence relation between one or more acquisition types and a conversion type, and the incidence relation is associated with one or more delivery package information; the information of the delivery package comprises an index identifier and a time-based index value sequence associated with the index identifier.

In the above application embodiment, a data structure of a data unit is used to store a plurality of types of conversion data of a user, and a plurality of types of delivery package information can be determined according to the association relationship between the acquisition type of the conversion data and the conversion type. Each putting package information comprises an index value sequence under one index, and the index value sequence is sorted based on time and stores index values at different time points. And then realize according to customer's index, acquisition mode, conversion type and index mark positioning index value sequence's data structure, can carry out normalization processing to the conversion data of different grade type through this data structure, improve normalization efficiency.

In an embodiment of the foregoing application, determining whether there is abnormal data in the index value sequence according to the time weight includes:

calculating a hash value of the normalized data according to the target characteristics contained in the normalized data;

determining a hash bucket where the normalized data is located according to the hash value;

and judging whether the index value sequence contained in the normalized data in each hash bucket has abnormal data or not according to the time weight.

In the embodiment of the application, before the normalized data is detected, the hash value is obtained according to the target characteristics of the normalized data, the normalized data is subjected to barrel division according to the hash value, then abnormal data detection is performed on the normalized data in each barrel division, further the normalized data is divided based on the hash value, and the abnormal detection efficiency is improved.

In an embodiment of the foregoing application, determining whether there is abnormal data in an index value sequence included in the normalized data in each hash bucket according to the time weight includes:

processing abnormal data detection in a plurality of hash buckets in parallel;

in each hash bucket, sequentially reading a sequence of index values in the hash bucket according to a serial sequence;

and judging whether abnormal data exist in the index sequence.

In the embodiment of the application, parallel computation can be performed among the hash buckets, and high-concurrency anomaly detection is realized. Meanwhile, inside each hash bucket, each index value sequence is processed in sequence in a serial mode, and the data processing efficiency is further improved.

weighting each index value in the index value sequence according to the time weight to obtain a mean value of the index value sequence, wherein the mean value is a mean number or a median of the index value sequence;

calculating the standard deviation or the absolute median difference of the index value sequence;

traversing the index value sequence according to the mean value and the standard deviation, and determining a residual error with the maximum numerical difference from the mean value;

calculating a critical value of index value sequence t distribution;

and determining whether the abnormal point exists according to the critical value and the standard deviation.

In the embodiment of the application, for an index value sequence, weighting each index value in the index value sequence according to time weight to obtain a mean value of the index value sequence; traversing each index value in the index value sequence according to the mean value and the standard deviation, determining a residual error with the maximum difference with the mean value, and then determining an abnormal point according to a threshold value of t distribution of the index sequence and the residual error. And then the mean value of the index value sequence is determined based on the time information of the index values, and the abnormal point in the index value sequence is determined based on the mean value, so that the abnormal point can be more accurately determined based on time validity, and the abnormal point detection efficiency is improved.

In an embodiment of the above application, weighting each index value in the index value sequence according to a time weight includes:

determining a weighting parameter of the index value according to the time interval between the acquisition time of the index value and the current time, wherein the weighting parameter is less than 1, and the numerical value of the weighting parameter and the length of the time interval are in an inverse proportion trend;

and weighting the index value according to the weighting parameter.

In the embodiment of the application, the weighting parameter of the index value is determined according to the time interval between the acquisition time of the index value and the current time, so that a weighting mode that the numerical value of the weighting parameter and the length of the time interval are in an inverse proportion trend can be realized, and the accuracy of weighting calculation is further improved.

An embodiment of the present application further provides a device for converting data processing, including:

the normalization module is used for carrying out normalization processing on various types of conversion data according to the index parameters to obtain normalized data, and the normalized data comprises an index value sequence based on time;

the anomaly detection module is used for judging whether the index value sequence has abnormal data or not according to the time weight;

and the exception handling module is used for deleting the exception data from the index value sequence if the exception data exists.

An embodiment of the present application further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the above-described embodiments.

Embodiments of the present application also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method of the above embodiments.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic flow chart diagram according to a first embodiment of the present application;

FIG. 2 is a schematic flow chart diagram according to a second embodiment of the present application;

FIG. 3 is a schematic flow chart diagram according to a third embodiment of the present application;

FIG. 4 is a schematic structural diagram according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing the method of converting data processing according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example one

Fig. 1 is a schematic flowchart of a method for converting data processing, which is provided in an embodiment of the present application, and is adapted to detect fluctuation of multi-source heterogeneous data in an objective conversion cost per click (cpcc), and may be executed by an electronic device such as a server, a terminal, and the like, where the method may be implemented by:

step 101, normalizing various types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises an index value sequence based on time.

The conversion data are classified into different categories according to different acquisition types and conversion types. The acquisition type and the conversion type appear in a group, and when an attention index value is generated, conversion data are generated by conversion data acquisition equipment such as a mobile terminal.

The collection types of the multiple types of conversion data are used to represent the collection mode of the conversion data, and the collection types may include Application Programming Interface (API) collection, js (javascript) code collection loaded in a landing page, Business Conversation Platform (Business Conversation Platform) collection, Application (APP) activation collection, SDK code collection in an applet, and the like.

The conversion type of the conversion data may be the type of operation that is identified as producing the conversion, such as a user triggering a target action in a search page or landing page. The target behavior may be consulting or submitting a form, etc.

The method and the device for converting the data comprise the steps that various types of conversion data are also called heterogeneous data sources, the conversion data carried by the heterogeneous data sources are subjected to normalization processing, and the obtained normalization data comprise index value sequences. The index value sequence is also called as a time sequence, and each element in the index value sequence (or the time sequence) comprises two pieces of time information and an index value. The time information indicates a time at which an index value is generated, the index value being a user behavior parameter of interest.

Optionally, obtaining a user identifier, a conversion type and an acquisition type of the conversion data; and normalizing the various types of conversion data according to the user identification, the conversion type and the acquisition type to obtain normalized data.

The acquisition type can be determined according to the acquisition mode of the acquired conversion data, such as an interface for receiving the conversion data, and the conversion type can be determined according to the index value type carried by the conversion data. And abstracting the user identification, the conversion type and the acquisition type from the conversion data, and traversing the conversion data to obtain an index value sequence.

Specifically, as shown in fig. 2, a data unit is configured for the user identifier;

In an internet promotional platform, each account may use one or more acquisition modalities to access data for one or more conversion types (marking a target behavior that an advertiser desires to optimize). One user identification is used to sign one account. The embodiment of the present application provides a new data structure, in which a data unit (also called unit) is configured for an account. Each data unit has one or more parcel information (also called cells). One data unit may be configured for each account, for example, if there are N accounts represented by N user identities, N data units may be configured, where N is a positive integer. Or, one data unit may be configured for a part of the accounts, for example, if there are N accounts represented by N user identifiers, M data units are configured, and one data unit is configured for each account of the M accounts, where M is a non-negative integer smaller than N.

Each delivery package information is configured with an incidence relation between the acquisition type and the conversion type, and the incidence relation represents the acquisition mode of the delivery package information and records the index value of the conversion type. The same user identification (namely account) can be configured with a plurality of delivery package information, and further the conversion data can be aggregated in a fixed time period. The information of the put-in package is an index value sequence (also called a time sequence) acquired according to the corresponding conversion type and acquisition type.

As shown in table 1, one user identifier corresponds to one data unit, and the data unit includes two association relations, which are: the acquisition type 1 is used to access the translation type 1 and the acquisition type 2 is used to access the translation type 2. The acquisition type 1-conversion type 1 comprises two pieces of delivery package information, and the two pieces of delivery package information respectively comprise an index 1-1 and an index value sequence associated with the index 1-1, and an index value sequence associated with the index 1-2. The acquisition type 2-the conversion type 2 includes two pieces of delivery package information, which respectively include an index 2-1 and an index value sequence associated therewith, and an index 2-2 and an index value sequence associated therewith.

TABLE 1

And 102, judging whether the index value sequence has abnormal data or not according to the time weight.

And respectively configuring time weight for each index value in the index sequence, and calculating to obtain the mean value of the index value sequence. And judging abnormal data in the index sequence according to the average value and the index value in the index sequence to finish the fluctuation detection.

And 103, deleting the abnormal data from the index value sequence if the abnormal data exists.

If the abnormal data exists, deleting the abnormal data from the index value sequence, and judging whether to return to the step 102. If the abnormal data does not exist, the index value sequence does not contain the abnormal data.

For example, it is determined whether the number of abnormal data that has been determined satisfies a preset number of abnormal data, or whether the number of times of determining whether there is abnormal data exceeds a preset number of cycles is performed. If the number of the abnormal data is less than the preset number of the abnormal data, or the number of times of judging whether the abnormal data exists does not exceed the preset cycle number, the step 102 is executed in a loop. And if the number of the abnormal data is larger than or equal to the preset number of the abnormal data, or the number of times of judging whether the abnormal data exists exceeds the preset cycle number, canceling and returning to the step 102.

Example two

Fig. 3 is a schematic flow chart of a method for converting data processing provided in the second embodiment of the present application, which further illustrates the above embodiment, and includes:

step 201, normalization processing is performed on multiple types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises an index value sequence based on time.

Step 202, calculating a hash value of the normalized data according to the target characteristics contained in the normalized data.

The target feature may be a user identifier or an index value in the normalized data, or the like. And calculating the hash value according to the user identifier, and counting the normalized data of the user identifiers with the same hash value in the same hash bucket. And calculating the hash value according to the index value, so that the normalized data of the index value with the same hash value can be obtained in the same hash bucket.

And step 203, determining a hash bucket where the normalized data is located according to the hash value.

And step 204, judging whether the index value sequence contained in the normalized data in each hash bucket has abnormal data or not according to the time weight.

Optionally, detecting abnormal data in a plurality of hash buckets in parallel; in each hash bucket, sequentially reading a sequence of index values in the hash bucket according to a serial sequence; and judging whether abnormal data exist in the index sequence.

And after Hash sub-buckets are carried out according to the target characteristics, a plurality of index value sequences are stored in each Hash sub-bucket. The processing object of the fluctuation detection is one or more index value sequences. Optionally, in one hash bucket, the index value sequence may be sequentially read according to a certain order, and each index value sequence is read. And judging whether abnormal data exist in the read index sequence. The order may be the first address order of the index value sequence in the storage space, or may be the storage order of the index value sequence in the above data structure.

In the embodiment of the application, when abnormal data detection is performed on data in each hash bucket, the data in the hash bucket is split into the size capable of being loaded to a memory for processing by taking a data unit as the minimum granularity of each processing. The data in the hash bucket is stored to disk. And parallel fluctuation detection is carried out among the hash buckets. Parallel computation is performed among the hash buckets, and high-concurrency anomaly detection is achieved. Meanwhile, the index value sequence is sequentially processed in each hash bucket in a serial mode, so that the random input and output times can be reduced, the program concurrency is increased, and the fluctuation detection efficiency is improved.

And step 205, if the abnormal data exist, deleting the abnormal data from the index value sequence.

EXAMPLE III

Fig. 3 is a schematic flow chart of a method for converting data processing provided in the third embodiment of the present application, which serves as a further description of the foregoing embodiment, and includes:

step 301, normalizing the multiple types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises a time-based index value sequence.

And 302, weighting each index value in the index value sequence according to the time weight to obtain a mean value of the index value sequence, wherein the mean value is a mean number or a median of the index value sequence.

In an actual popularization platform, the conversion data can generate expected fluctuation due to the change of the landing page of an advertiser or the change of a delivery plan, and at the moment, the reference meaning of the long-term historical data is not large and misjudgment is easily caused, so that an anomaly detection algorithm is required to be capable of adapting to the latest data distribution more timely. Because the fluctuation trend of each client conversion data is different, the uniform fluctuation threshold value cannot be applied to the conditions of all clients, and the fluctuation detection misjudgment or the judgment omission can be caused. The embodiment of the application uses a weighted limit standard deviation algorithm to carry out fluctuation detection based on the statistics and the distribution characteristics of the client data. The basic principle of the standard deviation algorithm is to assume that no abnormal value exists in the data set, delete the value (maximum value or minimum value) deviating from the mean value gradually in the data set, and synchronously update the corresponding t distribution critical value (for checking whether the hypothesis is established) until the hypothesis is established or the number of abnormal values exceeds the set k value. The embodiment of the application introduces a time weight beta parameter on the basis of an extreme standard deviation algorithm (ESD algorithm) for controlling time weight, and can be called a weighted extreme standard deviation algorithm (W-ESD algorithm).

Optionally, determining a weighting parameter of the index value according to the time interval between the acquisition time of the index value and the current time, wherein the weighting parameter is smaller than 1, and the value of the weighting parameter and the length of the time interval are in an inverse proportion trend; and weighting the index value according to the weighting parameter.

The mean value can be calculated by the following formula

Wherein the beta parameter is used to control the importance of the temporal features in calculating the mean, i.e. the weighting parameter. The closer the beta parameter is to 0, the mean value

The more the value of the index is biased to the latest time. T is₁,T₂…T_i…T_nThe index value is corresponding to each time information in the index value sequence. N is the number of index values included in the index value sequence.

The average value that can represent the series of indicator values may also represent the median of the series of indicator values.

And step 303, calculating the standard deviation or the absolute median difference s of the index value sequence.

When in use

S represents the standard deviation when representing the mean; when in use

When s represents a median indicates an absolute median difference, abbreviated as MAD (for the sequence T, | Ti-mean (T) |)

And step 304, traversing the index value sequence according to the mean value and the standard deviation, and determining the residual error with the maximum numerical difference from the mean value.

Residual error R with maximum numerical difference from mean value_jThe calculation formula of (a) is as follows:

wherein the content of the first and second substances,

a mean value representing a sequence of indicator values; t is_iIs a certain index value in the index value sequence traversed currently. K is the number of preset abnormal data.

Step 305, calculating a threshold value of the index value sequence t distribution.

Critical value lambda_jThe calculation formula of (2) is as follows:

wherein, t_p,n-j-1Representing the right critical value of the t distribution with the degree of freedom n-j +1 and the significance p.

And step 306, determining whether an abnormal point exists according to the critical value and the standard deviation.

Residual error R if the difference from the mean value is maximum_jGreater than a critical value lambda_jThen the index value is determined to be abnormal data. Otherwise, if the residual error R with the maximum difference with the mean value is_jLess than a critical value lambda_jThen it indicates that there is no abnormal data in the index value queue.

In the embodiment of the application, for an index value sequence, weighting each index value in the index value sequence according to time weight to obtain a mean value of the index value sequence; traversing each index value in the index value sequence according to the mean value and the standard deviation, determining a residual error with the maximum difference with the mean value, and then determining an abnormal point according to a threshold value of t distribution of the index sequence and the residual error. And then the mean value of the index value sequence is determined based on the time information of the index values, and the abnormal point in the index value sequence is determined based on the mean value, so that the abnormal point can be more accurately determined based on time validity, and the abnormal point detection efficiency is improved. The fluctuation detection algorithm provided by the embodiment of the application can support the fluctuation detection in a plurality of index latitudes, and can be used as a reference basis for attributing the fluctuation, so that the problem can be more effectively positioned. Adopting time weight to carry out fluctuation detection, so that a fluctuation detection system can adaptively relax or tighten an abnormal judgment standard according to the historical fluctuation condition of a client; the beta parameter is introduced, so that an application party can adjust the importance of the latest time data distribution in an anomaly detection algorithm according to a specific application scene, and the accuracy and the flexibility of fluctuation detection are improved.

Example four

Fig. 4 is a schematic structural diagram of an apparatus 400 for transforming data processing according to a fourth embodiment of the present application, where the apparatus is configured to execute the manner shown in the foregoing embodiments to implement a corresponding response function, and the apparatus may be applied to an electronic device such as a server or a terminal, and includes: a normalization module 401, an anomaly detection module 402, and an anomaly handling module 403. Wherein:

the normalization module 401 is configured to perform normalization processing on multiple types of conversion data according to the index parameter to obtain normalized data, where the normalized data includes a time-based index value sequence;

an anomaly detection module 402, configured to determine whether the index value sequence has abnormal data according to the time weight;

and an exception handling module 403, configured to delete the exception data from the index value sequence if the exception data exists.

In the above application embodiment, the normalization module 401 performs normalization processing on multiple types of conversion data according to the index parameter to obtain normalized data, where the normalized data includes a time-based index value sequence; the anomaly detection module 402 determines whether the index value sequence has abnormal data according to the time weight; if there is abnormal data, the abnormal processing module 403 deletes the abnormal data from the index value sequence. The normalization processing can convert various types of conversion data into normalization data with the same data structure, and abnormal data detection is performed on an index value sequence contained in the normalization data based on the time weight, so that the accuracy of abnormal data detection can be improved.

On the basis of the above embodiment, the normalization module 401 is configured to:

In the above application embodiment, the normalization module 401 can classify a plurality of types of conversion data according to the user identifier, the conversion type, and the acquisition type. And normalizing the conversion data according to the user identification, the conversion type and the acquisition mode of the conversion data to which the conversion data belongs, so as to realize the normalization of the heterogeneous data source. By recording the user identification, the conversion type and the acquisition type, the conversion data can be marked more accurately.

configuring a data unit for each user identifier;

In the above application embodiment, the normalization module 401 uses the data structure of the data unit to store multiple types of conversion data of one user, and can determine multiple types of delivery package information according to the association relationship between the acquisition type of the conversion data and the conversion type. Each putting package information comprises an index value sequence under one index, and the index value sequence is sorted based on time and stores index values at different time points. And then realize according to customer's index, acquisition mode, conversion type and index mark positioning index value sequence's data structure, can carry out normalization processing to the conversion data of different grade type through this data structure, improve normalization efficiency.

On the basis of the foregoing embodiment, the anomaly detection module 402 is configured to:

In the embodiment of the application, before the normalized data is detected, the anomaly detection module 402 obtains a hash value according to the target characteristics of the normalized data, performs barrel splitting on the normalized data according to the hash value, and then performs anomaly data detection on the normalized data in each barrel splitting, thereby realizing that the normalized data is divided based on the hash value, and improving the anomaly detection efficiency.

processing abnormal data detection in a plurality of hash buckets in parallel;

in each hash bucket, sequentially reading an index value sequence in the hash bucket according to a serial sequence;

and judging whether abnormal data exist in the index sequence.

In the above application embodiment, the anomaly detection module 402 may perform parallel computation among hash buckets, so as to implement highly concurrent anomaly detection. Meanwhile, inside each hash bucket, each index value sequence is processed in sequence in a serial mode, and the data processing efficiency is further improved.

calculating a critical value of index value sequence t distribution;

In the above application embodiment, for an index value sequence, the anomaly detection module 402 weights each index value in the index value sequence according to the time weight to obtain a mean value of the index value sequence; traversing each index value in the index value sequence according to the mean value and the standard deviation, determining a residual error with the maximum difference with the mean value, and then determining an abnormal point according to a threshold value of t distribution of the index sequence and the residual error. And then the mean value of the index value sequence is determined based on the time information of the index values, and the abnormal point in the index value sequence is determined based on the mean value, so that the abnormal point can be more accurately determined based on time validity, and the abnormal point detection efficiency is improved.

and weighting the index value according to the weighting parameter.

In the above application embodiment, the anomaly detection module 402 determines the weighting parameter of the index value according to the time interval between the acquisition time of the index value and the current time, so that a weighting mode that the value of the weighting parameter and the length of the time interval are in an inverse proportion trend can be realized, and the accuracy of weighting calculation is further improved.

EXAMPLE five

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for transforming data processing provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of transforming data processing provided herein.

Memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of translation data processing in the embodiments of the present application (e.g., normalization module 401, anomaly detection module 402, and anomaly processing module 403 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., a method of converting data processing in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device that converts data processing, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to the electronics that translate data processing via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of converting data processing may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus that converts data processing, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of transforming data processing, comprising:

normalizing various types of conversion data according to the index parameters to obtain normalized data, wherein the normalized data comprises an index value sequence based on time;

2. The method for converting data processing according to claim 1, wherein the normalizing the plurality of types of conversion data according to the index parameter to obtain normalized data includes:

3. The method for converting data processing according to claim 2, wherein said normalizing the conversion data of the plurality of types according to the user identifier, the conversion type, and the collection type to obtain normalized data comprises:

configuring a data unit for the user identification;

4. The method for converting data processing according to claim 1, wherein said determining whether there is abnormal data in the index value sequence according to the time weight includes:

5. The method for converting data processing according to claim 4, wherein said determining whether there is abnormal data in the index value sequence included in the normalized data in each hash bucket according to the time weight includes:

processing abnormal data detection in a plurality of hash buckets in parallel;

and judging whether abnormal data exist in the index sequence.

6. The method for converting data processing according to claim 1, wherein said determining whether there is abnormal data in the index value sequence according to the time weight includes:

weighting each index value in the index value sequence according to the time weight to obtain a mean value of the index value sequence, wherein the mean value is a mean number or a median number of the index value sequence;

calculating a critical value of the index value sequence t distribution;

and determining whether an abnormal point exists according to the critical value and the standard deviation.

7. The method of transforming data processing according to claim 6, wherein said weighting each metric value in the series of metric values according to a temporal weight comprises:

determining a weighting parameter of the index value according to the time interval between the acquisition time of the index value and the current time, wherein the weighting parameter is less than 1, and the value of the weighting parameter and the length of the time interval are in an inverse proportion trend;

and weighting the index value according to the weighting parameter.

8. An apparatus for transforming data processing, comprising:

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.