CN116703533A

CN116703533A - Business management data optimized storage analysis method

Info

Publication number: CN116703533A
Application number: CN202310986528.XA
Authority: CN
Inventors: 向国祥
Original assignee: Shenzhen Zhongtian Yunlian Technology Development Co ltd
Current assignee: Shenzhen Zhongtian Yunlian Technology Development Co ltd
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2023-09-05

Abstract

The invention relates to the technical field of business data processing, in particular to a business management data optimized storage analysis method, which comprises the following steps: adding order information of the current order to a cluster to which the current order belongs, and storing the order information to obtain an offset vector of a cluster center of the cluster to which the current order belongs after the order information of the current order is added to the cluster to which the current order belongs; correcting the offset vector, namely: the method comprises the steps of obtaining evaluation information and order time information of a current order, wherein the current order corresponds to evaluation information and order time information of historical orders of commodities, the current order corresponds to shopping operation information of each historical shopping order of a purchasing user, determining a first malicious index and a second malicious index of the current order based on the information, further determining a malicious index value of the current order, and correcting an offset vector according to the malicious index value. According to the invention, by correcting the offset vector, the classification storage accuracy of the order information is effectively improved, and the quick inquiry of the order information is facilitated.

Description

Business management data optimized storage analysis method

Technical Field

The invention relates to the technical field of business data processing, in particular to a business management data optimized storage analysis method.

Background

With the popularity and popularity of online purchases, more and more users begin to conduct online purchases. In the online shopping process, since various order information such as a purchasing user, evaluation data, order numbers, order generation time and the like can be generated for each online shopping order, the data volume of the order information of the commodity is huge, and the order information of a single commodity is usually required to be stored in a classified mode at the moment so that the subsequent query efficiency of the order information is improved.

Because the evaluation data of the online purchase order can not only provide a reference for a new user to purchase goods, but also be an extremely important index for different merchants to compete, the existing evaluation data based on the online purchase order is used for classifying and storing the order information of single goods. When sorting and storing order information of individual commodities, a common way is to perform evaluation dataClustering, classifying the corresponding order information according to the clustering result to obtain a plurality of clustering clusters, and storing the clustering clusters into a proper database to improve the subsequent query efficiency of the order information.

Currently in use ofWhen the order information of the online purchase order is classified and stored by the cluster, along with the update of the order information of the online purchase order, after new order information is incorporated into an existing cluster, the average value calculation is carried out on the cluster center of the cluster based on the evaluation data in the newly incorporated order information, namely, after the new order information is incorporated, the position average value of all sample data in the cluster, namely, the order information, is recalculated according to the evaluation data of all the order information in the cluster, an offset vector is determined according to the calculated position average value, and the existing cluster center of the cluster is updated by utilizing the offset vector, so that the subsequent cluster standard of the cluster is changed. In practice, operations such as malicious evaluation of an online shopping user or malicious brushing evaluation of a merchant exist, and order evaluation data brought by the operations generally have no reference value, and the corresponding influence on a clustering center is small, so that the classification storage accuracy of order information is poor when the clustering center of the clustering cluster is updated directly according to an offset vector. Thus, there is a need for such The order evaluation data is identified and analyzed, so that the influence of the order evaluation data on cluster updating is adjusted, and finally, the accuracy of classified storage of the order information is improved, so that the inquiring efficiency of the order information is facilitated.

Disclosure of Invention

The invention aims to provide a business management data optimized storage analysis method which is used for solving the problem of poor classification storage accuracy of the existing order information.

In order to solve the technical problems, the invention provides a business management data optimized storage analysis method, which comprises the following steps:

adding order information of a current order to a cluster to which the current order belongs, storing the order information, and acquiring an offset vector of a cluster center of the cluster to which the current order belongs after adding the order information of the current order to the cluster to which the current order belongs;

correcting the offset vector, wherein the correcting process comprises the following steps:

acquiring evaluation information and order time information of a current order, evaluation information and order time information of a historical order of a commodity corresponding to the current order, and shopping operation information of each historical shopping order of a purchasing user corresponding to the current order;

determining a first malicious index of the current order according to the evaluation information and the order time information of the current order and the evaluation information and the order time information of the historical order of the commodity corresponding to the current order;

Determining a browsing behavior characteristic value sequence corresponding to each historical shopping order of the current order corresponding to the purchasing user according to the shopping operation information of each historical shopping order of the current order corresponding to the purchasing user;

determining a second malicious index of the current order according to the difference of the current order in each historical shopping order of the corresponding purchasing user of the current order and other historical shopping orders on the browsing behavior characteristic value sequence;

according to the first malicious index and the second malicious index of the current order, determining the malicious index value of the current order, and correcting the offset vector according to the malicious index value.

Further, determining a first malicious indicator of the current order includes:

the evaluation information comprises an evaluation index and evaluation content;

screening a target historical order from the historical orders of the commodities corresponding to the current order according to the evaluation index of the current order and the evaluation index of the historical orders of the commodities corresponding to the current order, wherein the evaluation index of the target historical order is the same as the evaluation index of the current order;

determining a text similarity index value of the evaluation content between the current order and each target historical order according to the text similarity degree between the evaluation content of the current order and the evaluation content of the target historical order;

Determining a time span between the current order and each target historical order according to the difference between the order time information of the current order and the order time information of the target historical order;

and determining a first malicious index of the current order according to the evaluation content text similarity index value and the time span between the current order and each target historical order.

Further, a first malicious index of the current order is determined, and a corresponding calculation formula is as follows:

wherein ,for the first malicious indicator of the current order, +.>Text similarity index value for evaluation content between current order and nth target history order, +.>For the time span between the current order and the nth target historical order>For normalization function->Is the total number of target historical orders.

Further, determining a browsing behavior characteristic value sequence corresponding to each historical shopping order of the purchasing user corresponding to the current order includes:

according to the shopping operation information of each historical shopping order of the current order corresponding to the purchasing user, the browsing duration of the purchasing user corresponding to each historical shopping order for browsing each similar commodity, each browsing operation and the quantity of each browsing operation are determined;

According to the browsing time length, the number of each browsing operation and the number of each browsing operation of the purchasing user corresponding to each historical shopping order for browsing each similar commodity, determining the browsing behavior characteristic value corresponding to the purchasing user corresponding to each historical shopping order for browsing each similar commodity;

and arranging the browsing behavior characteristic values corresponding to the similar commodities browsed by the purchasing users corresponding to each historical shopping order according to the browsing time sequence, so as to obtain a browsing behavior characteristic value sequence corresponding to each historical shopping order of the purchasing users corresponding to the current order.

Further, determining the browsing behavior feature value corresponding to each similar commodity browsed by the purchasing user corresponding to each historical shopping order includes:

determining the accumulated sum of the number of various browsing operations of the purchasing users corresponding to each historical shopping order for browsing each similar commodity, the normalized value of the browsing time of the purchasing users corresponding to each historical shopping order for browsing each similar commodity, and the positive correlation mapping value of the number of the types of the browsing operations of the purchasing users corresponding to each historical shopping order for browsing each similar commodity;

and calculating the browsing behavior characteristic value corresponding to each similar commodity browsed by the purchasing user corresponding to each historical shopping order according to the accumulated sum, the normalized value and the positive correlation mapping value corresponding to each similar commodity browsed by the purchasing user corresponding to each historical shopping order, wherein the accumulated sum, the normalized value and the positive correlation mapping value are in positive correlation with the browsing behavior characteristic value.

Further, the browsing operation at least includes: turning over the commodity introduction, turning over the commodity evaluation and inquiring customer service.

Further, determining a second malicious indicator of the current order includes:

determining a dynamic time warping distance between a browsing behavior characteristic value sequence of a current order in each historical shopping order of the current order corresponding to the purchasing user and a browsing behavior characteristic value sequence of each historical shopping order except the current order, so as to obtain a browsing difference value between the current order in each historical shopping order of the current order corresponding to the purchasing user and each historical shopping order except the current order;

determining a negative correlation normalization result of the absolute value of the difference between the number of elements in the browsing behavior characteristic value sequence of the current order and the number of elements in the browsing behavior characteristic value sequence of each historical shopping order except the current order, so as to obtain a browsing difference weight between the current order in each historical shopping order of the current order corresponding to the purchasing user and each historical shopping order except the current order;

And calculating a weighted standard deviation according to the browsing difference value and the browsing difference weight between the current order in the historical shopping orders of the corresponding purchasing user and each historical shopping order except the current order, so as to obtain a second malicious index of the current order.

Further, determining a dynamic time warping distance between a browsing behavior feature value sequence of the current order and a browsing behavior feature value sequence of each of the other historical shopping orders except the current order in each of the historical shopping orders of the corresponding purchasing user comprises:

taking each browsing behavior characteristic value in the browsing behavior characteristic value sequence corresponding to each historical shopping order of the current order corresponding to the purchasing user as an ordinate, and taking the arrangement sequence number of each browsing behavior characteristic value in the browsing behavior characteristic value sequence as an abscissa, wherein the ordinate value and the abscissa corresponding to each browsing behavior characteristic value form a data scatter point, so that each data scatter point corresponding to each historical shopping order of the current order corresponding to the purchasing user is obtained;

performing least square fitting on each data scatter point corresponding to each historical shopping order of the current order corresponding to the purchasing user, so as to obtain a browsing behavior characteristic value curve corresponding to each historical shopping order of the current order corresponding to the purchasing user;

And calculating the dynamic time warping distance between the browsing behavior characteristic value curve corresponding to the current order and the browsing behavior characteristic value curve corresponding to each historical shopping order except the current order in each historical shopping order of the purchasing user by using a DTW algorithm.

Further, determining a malicious index value of the current order includes:

and carrying out weighted summation on the first malicious index and the second malicious index of the current order according to preset weights of the first malicious index and the second malicious index, and determining a weighted summation result as a malicious index value of the current order.

Further, the offset vector is corrected according to the malicious index value, and the corresponding calculation formula is:

wherein ,for the corrected offset vector, +.>For the offset vector before correction, +.>Is a malicious index value.

The invention has the following beneficial effects: in the process of classifying and storing order information of an online purchase order according to evaluation data of the online purchase order, the invention considers that the order evaluation data caused by such operations as malicious brushing and evaluation of merchants generally does not have reference value, so that the influence of the order evaluation data on corresponding clusters of the order information needs to be reduced, thereby improving the accuracy of classifying and storing the order information and being beneficial to quick inquiry of the order information. In order to achieve the object, after obtaining the offset vector of the clustering center of the cluster to which the order information of the current order is added after the cluster to which the order information of the current order belongs, in one aspect, the evaluation information and the order time information of the current order and the evaluation information and the order time information of the historical order of the commodity corresponding to the current order are analyzed, the identical condition of the evaluation information of the current order and the evaluation information of the historical order of the commodity corresponding to the current order is identified, so that the first malicious index of the current order is determined; on the other hand, by analyzing shopping operation information of each historical shopping order of the purchasing user corresponding to the previous order, whether the shopping process of the current order accords with the historical shopping habit is identified, so that a second malicious index of the current order is determined. The first malicious index and the second malicious index are comprehensively considered, the malicious index value of the current order is determined, the possibility that the evaluation data of the current order is malicious evaluation is characterized, and therefore the offset vector is reasonably corrected, the influence of malicious evaluation on the corresponding cluster of the order information is reduced, and the classification storage accuracy of the order information is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a business management data optimized storage analysis method according to an embodiment of the present invention;

fig. 2 is a flowchart of a correction process of an offset vector according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition, all parameters or indices in the formulas referred to herein are values after normalization that eliminate the dimensional effects.

The embodiment provides a business management data optimized storage analysis method, and a flow chart corresponding to the method is shown in fig. 1, and the method comprises the following steps:

step S1: and adding the order information of the current order to the cluster to which the current order belongs, storing the order information, and acquiring an offset vector of a cluster center of the cluster to which the current order belongs after adding the order information of the current order to the cluster to which the current order belongs.

In the business management data storage process of the shopping platform, for any commodity, evaluation information of a plurality of historical orders of the commodity is obtained in advance, wherein the evaluation information comprises an evaluation index and evaluation content. The evaluation index is the overall evaluation of the commodity and is used for distinguishing good evaluation and poor evaluation, the evaluation index can be the number of stars according to the evaluation mode of the current online shopping platform, the higher the number of stars is, the higher the overall evaluation of the commodity is, the larger the evaluation index is, the range of the evaluation index is 1-5, the lowest is 1, and the highest is 5. The evaluation content refers to specific evaluation content of the commodity by the purchasing user, and is specific text description of the commodity opinion. Then based on the pre-acquired evaluation index of a plurality of historical orders of any commodity, the method adopts Clustering is carried out on order information of a plurality of historical orders acquired in advance, so that each cluster is obtained. That is, based on the evaluation index of a plurality of history orders acquired in advance, +.>Clustering the evaluation indexes to obtain a plurality of evaluation index classifications, and taking order information of the historical orders corresponding to the evaluation indexes in each evaluation index classification as a cluster, so that each cluster can be obtained, and the historical orders corresponding to the cluster center in each evaluation index classification are the cluster centers of the corresponding clusters. The order information of the historical order refers to all relevant information of the historical order, and comprises various information data such as purchasing users, evaluation indexes, evaluation contents, order numbers, order generation time and the like.

After each cluster corresponding to the commodity is obtained, any newly added order of the commodity can be called a current order, the Euclidean distance from the evaluation index of the order to the evaluation index of the cluster center of each existing cluster is calculated, the cluster corresponding to the smallest Euclidean distance is used as the cluster to which the order information of the order belongs, and the order information of the order is added to the cluster to which the order belongs and stored.

Meanwhile, in order to facilitate improvement of the classification accuracy of order information of a subsequent newly added order, after the order information of the order is added to the cluster to which the order belongs and stored, an offset vector of a cluster center of the cluster to which the order information of the order is added to the cluster to which the order belongs needs to be determined. In this embodiment, when determining the offset vector of the cluster center of the belonging cluster, after adding the order information of the order to the belonging cluster, the position average value corresponding to all sample data in the belonging cluster is calculated, where the position average value is determined by averaging all sample data, that is, the evaluation index in the order information, and the position average value is used as the position of the new cluster center. And then determining an offset vector of the cluster center according to the current cluster center position of the affiliated cluster and the new cluster center position, wherein the offset vector indicates the moving direction and the moving distance when moving from the current cluster center position to the new cluster center position. After determining the offset vector of the cluster center of the belonging cluster, the cluster center of the belonging cluster is moved according to the offset vector, so that the update of the cluster center of the belonging cluster is completed.

Because the existing order data can have abnormal operations which are maliciously praised by merchants, and the order information brought by the abnormal operations does not have practical reference value, after the order information is added into a corresponding cluster as sample data, the cluster offset of the cluster can be abnormal, and further the stored cluster has poorer and worse value of reference analysis of subsequent purchasing users. Therefore, identification analysis is needed to be carried out on the order information so as to correct the calculated offset vector, thereby adjusting the influence of the order information on cluster updating and finally improving the accuracy of classifying and storing the order information.

Step S2: the offset vector is corrected, and a flow chart corresponding to the correction process is shown in fig. 2, and the method comprises the following steps:

step S21: and acquiring evaluation information and order time information of the current order, evaluation information and order time information of the historical order of the commodity corresponding to the current order, and shopping operation information of each historical shopping order of the purchasing user corresponding to the current order.

Based on the analysis, in order to identify order information caused by abnormal operation, an order is newly added to any one of the commodities, and evaluation information and order time information of the order are acquired. The evaluation information includes an evaluation index and an evaluation content, and the specific content thereof is described in the above content and will not be described here. The order time information refers to the time of generation of the order, and may be the time of generation of the order evaluation as another embodiment. And simultaneously, acquiring each historical order of the commodity, wherein the historical orders refer to all orders before the newly added order, and acquiring evaluation information and order time information of each historical order of the commodity in the same way.

Obtaining a purchasing user ID corresponding to any one of the newly added orders of the commodity, obtaining a log record of the purchasing user ID on a current platform according to the purchasing user ID, and obtaining data information related to browsing records, searching records, purchasing records and the like and used for user preference analysis and personal information of the purchasing user according to a timestamp recorded in log record data, wherein the various data information related to each historical purchasing order of the purchasing user is collectively referred to as shopping operation information. The time range corresponding to each of the historical shopping orders of the purchasing user can be adjusted according to the requirement, and each of the historical shopping orders of the purchasing user determined in the embodiment refers to an order generated in the past 30 days. Each of the purchase user's historical purchase orders includes a current order and a plurality of purchase orders preceding the current order.

Step S22: and determining a first malicious index of the current order according to the evaluation information and the order time information of the current order and the evaluation information and the order time information of the historical order of the commodity corresponding to the current order.

Optionally, determining the first malicious indicator of the current order includes:

Specifically, in consideration of malicious brushing evaluation behaviors of merchants existing on the online shopping platform, malicious brushing evaluation comprises malicious brushing evaluation and poor evaluation, and the malicious brushing evaluation generally shows similar evaluation content and a shorter time interval, so that based on the characteristics, corresponding analysis can be performed according to the self characteristics of the current order and the historical sales records of corresponding commodities, and the possibility that the current order is a malicious brushing evaluation order can be determined.

Since the malicious brush evaluation mentioned above is usually an intentional brush evaluation, which usually corresponds to a good or bad evaluation, i.e., which usually corresponds to 1 star or 5 stars, instead of the intermediate value, and the evaluation content thereof is also usually highly similar, it is possible to first screen out each of the historical orders identical to the evaluation index on the historical order of the commodity corresponding to the current order based on the evaluation index of the current order, and take these historical orders as target historical orders. And then, performing text similarity evaluation on the evaluation content of the current order and the evaluation content of the target historical orders through an existing semantic recognition model such as word2vec, thereby obtaining the text similarity index value of the evaluation content between the current order and each target historical order. Meanwhile, based on the order time information of the current order and the order time information of the target historical orders, a time span between the current order and each target historical order is determined, wherein the time span refers to the time length between the generation time of the two orders. According to the evaluation content text similarity index value and the time span between the current order and each target historical order, a first malicious index of the current order is obtained, and a corresponding calculation formula is as follows:

For the first malicious index of the current order, the evaluation content text similarity index value between the current order and the nth target historical orderThe higher the text similarity evaluation value representing the evaluation contents of the current order and the nth target historical order, the more serious the degree of repetition of the evaluation contents representing the current order and the nth target historical order, and the more likely the evaluation contents of the current order and the nth target historical order are evaluated according to the same template. Time span between current order and nth target history order +.>The function of the weight coefficient is considered that the smaller the time span between the current order and the nth target historical order is, namely the closer the time between the current order and the nth target historical order is, the evaluation content text similarity index value corresponding to the nth target historical order is at the current +. >The target historical orders have higher reference degree, but the logic is exactly opposite to the text similarity index value of the evaluation content, so the target historical orders are normalized and the target historical orders pass +.>Subtracting the normalized value to change the logic sequence to obtain the first malicious index +.>. When the first malicious index->The closer to 1 the value of (c) indicates that the more likely the evaluation data of the current order is a malicious evaluation at this time.

Step S23: and determining a browsing behavior characteristic value sequence corresponding to each historical shopping order of the current order corresponding to the purchasing user according to the shopping operation information of each historical shopping order of the current order corresponding to the purchasing user.

Optionally, determining the browsing behavior characteristic value sequence corresponding to each historical shopping order of the purchasing user corresponding to the current order includes:

Optionally, determining the browsing behavior feature value corresponding to each similar commodity browsed by the purchasing user corresponding to each historical shopping order includes:

Specifically, the first malicious index of the current order is determined according to the evaluation content text similarity index value and the time span between the current order and a plurality of target historical orders which are relatively close in time sequence, and errors may occur. The disadvantages or advantages perceived by each purchasing user are substantially the same for the same commodity, and similar text content is inevitably present for the evaluation content used at this time to describe the usage experience thereof. Therefore, the shopping habit represented by the log record of the purchasing user corresponding to each order is required to be analyzed, so that a more accurate second malicious index is obtained, and the first malicious index is combined, so that more accurate evaluation is achieved.

Based on the above analysis, in order to determine whether the current order is a malicious brush evaluation, it can be examined whether the browsing process at the time of purchasing the current order accords with the historical shopping habit of the purchasing user of the order, for example, decision time, whether to browse a plurality of similar commodities and then make a decision, etc., while for the order corresponding to the malicious brush evaluation, since it is already determined which commodity corresponds to which merchant before purchasing, etc., the corresponding decision time is usually very short. Therefore, based on the shopping operation information of each historical shopping order obtained according to the log record of the corresponding purchasing user of the current order, namely the shopping operation information in the shopping process of each historical shopping order, the shopping process can be judged as one shopping process according to the time stamp recorded by the log record, and the browsing duration, the browsing operation and the number of each browsing operation of each similar commodity, which are browsed by the purchasing user corresponding to each historical shopping order, can be determined. The browsing operation here refers to: various operations such as turning over the introduction of the commodity, turning over the evaluation of the commodity, inquiring the customer service, etc. The number of each browsing operation refers to the number of times each browsing operation occurs. Based on the browsing time length, the browsing operation and the number of the browsing operations of each similar commodity browsed by the purchasing user corresponding to each historical shopping order, a decision model of a single similar commodity is constructed, and a corresponding calculation formula is as follows:

wherein ,browsing behavior characteristic values corresponding to the similar commodities for the purchasing users corresponding to the historical shopping orders, wherein T is browsing duration of browsing the similar commodities for the purchasing users corresponding to the historical shopping orders, and +.>For normalization function->The number of i-th browsing operations for browsing each of the same kind of goods for the purchasing user corresponding to each of the historical shopping orders,/->The number of types of browsing operations for browsing each similar commodity for the purchasing user corresponding to each historical shopping order, and e is a natural constant.

In the calculation formula for browsing the characteristic value of the browsing behavior corresponding to each similar commodity by the purchasing user corresponding to each historical shopping order,representing the dwell time T of a single generic commodity page for each historical shopping orderNormalized value, the longer the stay time, the longer the decision process that the purchasing user is on the current single same kind of commodity.Representing that +.>The number of operations of the browsing-like operation is the above-mentioned evaluation of the turned-over commodity, inquiry of customer service, etc. The decision process on the current single similar commodity is characterized by accumulating the operation quantity of the multiple types of browsing operations. The number of categories of the operations on each similar commodity page reflects the number of times that each similar commodity is known, for example, the same accumulated number, such as five times of operation of turning over the photo, and the other is to turn over the photo, three times of turning over the comment, and the same accumulated five times, but the more complex the latter is in terms of the decision process, the more the content is known, and thus by- >The formula carries out positive correlation mapping on the number of types of browsing operation of each similar commodity, and takes the positive correlation mapping value as a degree value to be brought into a calculation formula of browsing behavior characteristic values, so that behavior characteristics of a purchasing user corresponding to each historical shopping order for browsing each similar commodity are accurately identified.

Through the steps, the characteristic value of the browsing behavior corresponding to each similar commodity browsed by the purchasing user corresponding to each historical shopping order can be obtained, namely the quantized value of the decision process of browsing each similar commodity by the purchasing user corresponding to each historical shopping order can be obtained, and the larger the value is, the longer and the more complex the decision process on the similar commodity page is.

Since in a normal situation, a purchasing user browses a plurality of similar commodities in a single shopping process, and finally decides the commodity to be purchased after browsing and knowing each similar commodity page, the decision model just can only represent the decision process of a single similar commodity, and in the whole shopping process of a shopping order, some purchasing users can browse a plurality of different commodities and then order the commodity, some purchasing users can continuously tangle, the same commodity can browse a plurality of times, but the characteristic values of the decision model of a plurality of continuous single browsing commodities in time sequence reflect a similar characteristic in time sequence.

Based on the analysis, in the shopping process corresponding to each historical shopping order of the purchasing user corresponding to the current order, the characteristic values of all decision models for browsing the single similar commodities, namely the browsing behavior characteristic values, are arranged according to the sequence of the browsing starting time of the single similar commodity from front to back, so that a browsing behavior characteristic value sequence can be obtained. And then taking each browsing behavior characteristic value in the browsing behavior characteristic value sequence as an ordinate value, and taking the corresponding arrangement sequence number in the sequence as an abscissa, wherein each pair of ordinate values and abscissa form a data scatter point, and the data scatter point represents a single commodity browsing behavior, namely represents a certain single commodity browsing behavior in the shopping process corresponding to each historical shopping order. In this way, a distributed profile of multiple browsing actions during the shopping process corresponding to each historical shopping order of the current order corresponding to the purchasing user can be obtained. The distribution characteristics mainly characterize the change condition of the browsing behavior characteristic value along with the increase of the number of browsed commodities in the process from the starting of shopping to the ordering of each historical shopping order corresponding to the purchasing user of the current order, namely the change condition of the decision degree for deciding the commodities in the whole shopping process is reflected.

After each data scattered point corresponding to each historical shopping order of the current order corresponding to the purchasing user is obtained in the mode, performing least square curve fitting on the data scattered points to obtain a browsing behavior characteristic value curve corresponding to each historical shopping order, wherein the ordinate of the starting point and the ordinate of the ending point of the curve correspond to the browsing behavior characteristic values at two end points in the browsing behavior characteristic value sequence.

Step S24: and determining a second malicious index of the current order according to the difference of the current order in each historical shopping order of the corresponding purchasing user of the current order and other historical shopping orders on the browsing behavior characteristic value sequence.

Optionally, determining the second malicious indicator of the current order includes:

Specifically, according to the browsing behavior characteristic value curves corresponding to the current orders in the historical shopping orders corresponding to the purchasing users and the browsing behavior characteristic value curves corresponding to the rest historical shopping orders, calculating the dynamic time warping distance between the browsing behavior characteristic value curves corresponding to the current orders and the browsing behavior characteristic value curves corresponding to the rest historical shopping orders by using a dynamic time warping algorithm DTW, and taking the dynamic time warping distance as a browsing difference value. It should be understood that when determining the characteristic curve of the browsing behavior corresponding to each historical shopping order of the user corresponding to the current order, the model is constructed according to multiple browsing behaviors in the shopping process of each historical shopping order, and there may be only one or two cases of browsing behaviors in the shopping process of part of the historical shopping orders, at this time, curve fitting cannot be performed, and at this time, since each historical shopping order necessarily includes at least one browsing behavior corresponding to the current historical shopping order, the dynamic time warping distance between the characteristic sequence of the browsing behavior corresponding to the current order and the characteristic sequence of the browsing behavior corresponding to each other historical shopping order is calculated directly by using the dynamic time warping algorithm DTW, and the dynamic time warping distance is used as the browsing difference value.

Meanwhile, when the browsing behavior characteristic value curves corresponding to the historical shopping orders are constructed, the length of the curves is actually determined by the behavior quantity of different commodities browsed in the shopping process of the historical shopping orders, so that the browsing behavior characteristic value curves obtained in the shopping processes of different historical shopping orders may have different lengths, the browsing behavior characteristic value curves also need to be used as weights according to the browsing behavior quantity difference between the shopping processes of different historical shopping orders, so that a standard deviation formula model is corrected, and finally, the discrete difference between the shopping process of the current order and the shopping processes of the integral multiple historical shopping orders, namely, the second malicious index, is obtained, wherein the corresponding calculation formula is as follows:

wherein ,for the second malicious indicator of the current order, +.>Current order and other +.>Browsing difference value between historical shopping orders, < >>For the number of elements in the sequence of browsing behavior characteristics values of the current order in the respective historical shopping orders of the current order corresponding to the purchasing user,/->Other +.f. in each of the historical shopping orders for the corresponding purchasing user for the current order, except for the current order >Number of elements in the sequence of browsing behavior characteristics of the individual historical shopping orders, < >>For the normalization function, U is the total number of each of the current order corresponding to the purchasing user's respective historical shopping orders except for the current order, +.>To take absolute value symbols.

In the above-described calculation formula for the second malicious indicator of the current order,representing the current order and other +.>The larger the difference between the decision model of the historical shopping orders, i.e. characterizing all browsing behaviour during the two shopping, the more dissimilar the distribution in time sequence, +.>Then characterize the shopping procedure of the current order and the other +.>The difference in the number of browsing actions during shopping of the historical shopping orders, the smaller the difference, the further ∈>The higher the reference degree of the shopping process of the historical shopping orders is used for participating in judging whether the shopping process of the current order is discrete, the browsing difference weight value correspondingly obtained at the momentThe larger. Finally when the second malicious index->And when the current order is larger, the shopping model representing the current order shows the change of browsing behaviors in the whole purchasing process, and the evaluation data of the current order is more likely to be maliciously evaluated as different from decision behavior habits expressed in the shopping process of the historical shopping order. / >

Step S25: according to the first malicious index and the second malicious index of the current order, determining the malicious index value of the current order, and correcting the offset vector according to the malicious index value.

Optionally, determining the malicious index value of the current order includes:

Specifically, weights of the first malicious index and the second malicious index are preset, in this embodiment, both weights are set to 0.5, and an implementer can adjust the sizes of the two weights according to actual situations. After the first malicious index and the second malicious index of the current order are obtained, weighting and summing the first malicious index and the second malicious index of the current order according to the preset weights of the first malicious index and the second malicious index, so that the malicious index value of the current order is obtained, and the corresponding calculation formula is as follows:

wherein ,for the malicious index value of the current order, +.>For the first malicious indicator of the current order, +. >For the second malicious indicator of the current order, +.>For normalization function-> andWeights of the first malicious index and the second malicious index respectively,=0.5，=0.5。

for the malicious index value of the current order, when the first malicious index and the second malicious index are larger, the evaluation data of the current order is more likely to be malicious evaluation, the corresponding malicious index value is larger, and at the moment, the offset influence of the order information of the current order on the clustering center of the whole cluster is weaker after the order information of the current order participates in the clustering subsequently.

Correcting an offset vector of a clustering center of a cluster to which the current order belongs after order information of the current order is added to the cluster to which the current order belongs based on a malicious index value of the current order, wherein a corresponding calculation formula comprises:

wherein ,for the corrected offset vector, +.>For the offset vector, +.>Is the malicious index value.

For the calculation formula of the corrected offset vector, the offset vector is corrected by utilizing the malicious index value of the current order, only the offset value of the offset vector is corrected at the moment, the direction of the offset vector is not changed, when the malicious index value is larger, the evaluation information of the current order is more likely to be malicious evaluation, and the offset value of the offset vector is smaller after correction at the moment, so that the influence of evaluation data caused by malicious evaluation on a clustering center is reduced, the accuracy of order information classification storage is effectively improved, and the subsequent query efficiency of different types of order information is further improved.

It should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A business management data optimized storage analysis method, comprising the steps of:

2. The method of claim 1, wherein determining a first malicious indicator of a current order comprises:

3. The method for optimizing storage and analysis of business management data according to claim 2, wherein the first malicious index of the current order is determined, and the corresponding calculation formula is:

4. The method of claim 1, wherein determining a sequence of browsing behavior characteristics corresponding to each of the historical shopping orders of the purchasing user corresponding to the current order comprises:

5. The method for optimizing storage and analysis of business management data according to claim 4, wherein determining a browsing behavior characteristic value corresponding to each of the similar products browsed by the purchasing user corresponding to each of the historical shopping orders comprises:

6. The method for optimizing storage and analysis of business management data according to claim 4, wherein the browsing operation comprises at least: turning over the commodity introduction, turning over the commodity evaluation and inquiring customer service.

7. The method of claim 1, wherein determining a second malicious indicator of a current order comprises:

8. The method of claim 7, wherein determining a dynamic time warping distance between a browsing behavior feature value sequence of a current order and a browsing behavior feature value sequence of each of the other historical shopping orders except the current order in the respective historical shopping orders of the current order corresponding to the purchasing user comprises:

9. The method of claim 1, wherein determining a malicious indicator value for a current order comprises:

10. The method for optimizing, storing and analyzing business management data according to claim 1, wherein the offset vector is corrected according to the malicious index value, and the corresponding calculation formula is: