CN116485449A

CN116485449A - Data prediction method and system applied to electronic commerce

Info

Publication number: CN116485449A
Application number: CN202310446184.3A
Authority: CN
Inventors: 马靖航; 石继刚; 杨�远
Original assignee: Guangzhou Pinxing Huayue Trading Co ltd
Current assignee: Guangzhou Pinxing Huayue Trading Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-07-25

Abstract

The invention discloses a data prediction method and a system applied to electronic commerce, wherein the method comprises the following steps: acquiring a flow data set in a period to be predicted, and extracting first data characteristics and first user characteristics of each flow data in the flow data set; classifying each piece of flow data to obtain a historical flow data set, extracting second data features and second user features of each piece of flow data in the historical flow data set, and clustering according to the historical flow data; calculating the order success rate corresponding to each historical flow data cluster respectively; and screening out target yield with the corresponding similarity larger than a set threshold, calculating corresponding yield according to products of the target yield and corresponding current flow data clusters, and calculating total yield according to sum of the yields respectively corresponding to the current flow data clusters. By applying the embodiment of the invention, the predicted data can be more accurate.

Description

Data prediction method and system applied to electronic commerce

Technical Field

The invention relates to the technical field of electronic commerce, in particular to a data prediction method and system applied to electronic commerce.

Background

Electronic commerce is not limited by geography and time, so that the electronic commerce has high requirement on timeliness, for example, the response of inventory data is required to be accurate and rapid.

The prior art discloses a method for generating a turn-up quantity suggestion of an online commodity, which comprises the following steps: acquiring the flow and the conversion rate of the commodity in the preset number of days on line of the online store, and acquiring the total flow and the total conversion rate of the commodity in the whole life cycle according to the flow and the conversion rate of the commodity in the preset number of days on line of the online store, wherein the life cycle is that the flow of the commodity after the commodity is on line meets a preset flow threshold condition or the time of the commodity on line meets a preset time threshold condition; calculating the water flow of the commodity according to the total flow and the total conversion rate; acquiring the return rate of the commodity, and generating sales volume data of the commodity according to the return rate and the water flow volume; and generating a turn-over quantity suggestion according to the sales quantity data.

In the prior art, the total sales volume data of the commodity is calculated by acquiring the flow and the conversion rate in the preset days after the commodity is online, so that the bill turning volume suggestion can be generated in the preset days after the commodity is online, however, the conversion rates corresponding to different crowds and different channels are different, and therefore, the bill of delivery calculated in the prior art is not accurate enough.

Disclosure of Invention

The invention aims to provide a data prediction method and a system applied to electronic commerce.

The invention solves the technical problems through the following technical scheme:

the invention provides a data prediction method applied to electronic commerce, which comprises the following steps:

acquiring a flow data set in a period to be predicted, and extracting first data features and first user features of each flow data in the flow data set, wherein the first data features comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;

classifying each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;

acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;

calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.

Optionally, the acquiring the flow data set in the period to be predicted includes:

and acquiring flow data in a period to be predicted, and performing authentication and cleaning on the flow data to obtain a data flow set.

Optionally, the classifying processing is performed on each piece of traffic data according to the first data feature and the first user feature to obtain a current traffic data cluster, including:

for each piece of flow data, marking the flow data by taking the first data characteristic and the first user characteristic as tag values to obtain marked flow data;

and obtaining the category number of the tag value corresponding to the first data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the current flow data cluster.

Optionally, the obtaining the number of kinds of tag values corresponding to the first data feature includes:

splicing the first data characteristic and the first user characteristic into a tag value;

and performing de-duplication treatment on the tag value to obtain a de-duplicated tag value, and performing counting treatment on the de-duplicated tag value.

Optionally, the splicing the first data feature and the first user feature into the tag value includes:

splicing the first data characteristic before the first user characteristic into a first label value;

splicing the first data feature after the first user feature into a second tag value;

the first tag value and the second tag value are collected as tag values.

Optionally, the performing deduplication processing on the tag value includes:

adding each tag value into a tag value set, and judging whether a first tag value corresponding to a current tag value is the same as the first tag value of other tag values or not according to the current tag value;

if yes, deleting the current tag value;

if not, judging whether the first label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, and returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values; if not, executing the next step;

judging whether a second label value corresponding to a current label value is the same as a first label value of other label values or not according to the current label value;

if yes, deleting the current tag value;

if not, judging whether the second label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value and taking the next label value as the current label value, returning to execute the step of judging whether the first label value corresponding to the current label value is the same as the first label value of other label values or not, and if not, reserving the current label value.

Optionally, the classifying the historical traffic data according to the second data feature and the second user feature to obtain a historical traffic data cluster includes:

for each piece of flow data, marking the historical flow data by taking the second data characteristic and the second user characteristic as tag values to obtain marked historical flow data;

and obtaining the category number of the tag value corresponding to the second data characteristic, taking the category number as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the historical flow data cluster.

Optionally, the screening the target success rate with the corresponding similarity greater than the set threshold value from the order success rates according to the similarity between the first data feature and the second data feature and the similarity between the first user feature and the second user feature includes:

screening out a first success rate with the corresponding similarity larger than a set threshold value from the success rates of all orders according to the similarity of the first data characteristic and the second data characteristic;

screening second success rates with the corresponding similarity larger than a set threshold value from the order success rates according to the similarity of the first user characteristics and the second user characteristics;

and taking the set of the first success rate and the second success rate as a target success rate.

Optionally, the setting the set of the first success rate and the second success rate as the target success rate includes:

taking the success rate that the similarity of the first data characteristic and the second data characteristic and the similarity of the first user characteristic and the second user characteristic are larger than a set threshold value as a third success rate;

calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,

t is the target yield; w1 is the weight corresponding to the first success rate; a is the average value of the first yield; w2 is the weight corresponding to the first success rate; b is the average value of the second yield; w3 is the weight corresponding to the third traffic rate; c is the average of the third yield.

The invention provides a data prediction system applied to electronic commerce, which comprises:

the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a flow data set in a period to be predicted, and extracting first data characteristics and first user characteristics of each piece of flow data in the flow data set, wherein the first data characteristics comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;

the classification module is used for respectively carrying out classification processing on each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;

the calculation module is used for calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.

Compared with the prior art, the invention has the following advantages:

the method comprises the steps of clustering flow data sets in a period to be predicted to obtain current flow data clusters, and clustering historical flow data to obtain historical flow data clusters; and then screening out corresponding target yield according to the similarity of the clustering result, calculating corresponding yield according to the product of the target yield and the current flow data clustering, and taking the yield as the basis for inventory data adjustment.

Drawings

FIG. 1 is a schematic flow chart of a data prediction method applied to electronic commerce according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process for de-duplication of a current tag value in a data prediction method applied to electronic commerce according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a data prediction system applied to electronic commerce according to an embodiment of the present invention.

Detailed Description

The following describes in detail the examples of the present invention, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following examples.

Example 1

Fig. 1 is a flow chart of a data prediction method applied to electronic commerce according to an embodiment of the present invention, as shown in fig. 1, the method includes:

s101: acquiring a flow data set in a period to be predicted, and extracting first data features and first user features of each flow data in the flow data set, wherein the first data features comprise: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;

and obtaining flow data in a period to be predicted, and carrying out authentication and cleaning on the flow data to obtain a flow data set. Specifically, data cleansing refers to deleting, correcting, or deleting erroneous, incomplete, misformatted, or redundant data in a database. Data cleansing not only corrects errors, but also enhances consistency between different data from each individual information system. The special data cleaning software can automatically detect the data file, correct error data and integrate the data in a format consistent with the whole enterprise.

Each piece of traffic data in the traffic data set is processed as follows:

and processing information contained in each flow data according to the storage position of each field in the flow data and the stored field content, for example, the 10 th to 30 th bits in the flow data are stored as IP addresses, extracting the 10 th to 30 th bits of characters, checking whether the characters are IP addresses or not, and taking the 10 th to 30 th bits of characters as the IP addresses after the characters pass the check.

Extracting data characteristics of each first user according to the method, wherein the login equipment type can comprise a computer, a mobile phone, other intelligent terminal equipment and the like; such as login port types may include: HTTP proxy port, socks proxy port, FTP proxy port, telent proxy port, etc.; the data types may include a numerical type, a string type, and a date-time type.

The first user characteristic comprises: one or a combination of user identification information and user behavior information, wherein the user behavior information may include: logging in data, viewing data, praise data, joining shopping cart data, payment data, etc.

S102: classifying each piece of flow data according to the first data characteristics and the first user characteristics to obtain a current flow data cluster;

for each piece of flow data, marking the flow data by taking the first data characteristic and the first user characteristic as tag values to obtain marked flow data:

the flow data 1 corresponds to the first data characteristic 1 and the first user characteristic 1;

the flow data 2 corresponds to the first data characteristic 2 and the first user characteristic 2;

the flow data 3 corresponds to the first data characteristic 1 and the first user characteristic 2;

the flow data 4 corresponds to the first data characteristic 1 and the first user characteristic 2;

the flow data 5 corresponds to the first data characteristic 3 and the first user characteristic 3;

then, in order to avoid the inversion of the storage positions of the first data feature and the first user feature in some data protocols, the first data feature may be spliced into a first tag value before the first user feature; splicing the first data feature after the first user feature into a second tag value; the first tag value and the second tag value are set as tag values, for example,

the first tag value corresponding to the flow data 1 is: first data feature 1-first user feature 1;

the second tag value corresponding to the flow data 1 is: first user feature 1-first data feature 1;

the first tag value corresponding to the flow data 2 is: first data feature 2-first user feature 2;

the second tag value corresponding to the flow data 2 is: first user feature 2-first data feature 2;

the first tag value corresponding to the flow data 3 is: first data feature 1-first user feature 2;

the second tag value corresponding to the flow data 3 is: first user feature 2-first data feature 1;

the first tag value corresponding to the flow data 4 is: first data feature 1-first user feature 2;

the second tag value corresponding to the flow data 4 is: first user feature 2-first data feature 1;

thus, each flow data has a label value, and the number of label values of each flow data is 2.

Then, in embodiment 1 of the present invention, the tag value may be subjected to a duplication removal process to obtain a duplication-removed tag value, and the duplication-removed tag value may be subjected to a counting process. For example, since the tag values of the flow data 3 and the flow data 4 overlap, if the tag value of the flow data 3 and the tag value of the flow data 4 are identical, the tag values corresponding to the flow data 3 and the flow data 4 are subjected to the deduplication process, and the number of tag values after the deduplication is obtained.

And taking the number of the categories as a K value, and carrying out clustering processing by using a K-means clustering algorithm to obtain the current flow data cluster.

Further, in order to reduce the number of K values and improve the clustering efficiency, the number of kinds of tag values corresponding to the first data feature is obtained.

Fig. 2 is a schematic diagram of a process of de-duplication of a current tag value in a data prediction method applied to electronic commerce according to an embodiment of the present invention, where as shown in fig. 2, the method includes:

s201: adding each tag value into a tag value set, and judging whether a first tag value corresponding to a current tag value is the same as the first tag value of other tag values or not according to the current tag value;

s202: if the judgment result in the step S201 is yes, deleting the current tag value; for example, if the first tag values of the traffic data 1 and the traffic data 3 are the same, the tag value of the traffic data 1 or the traffic data 3 is deleted, and only one tag value is reserved, so that the first data feature can be subjected to de-duplication processing, and the type number of the first data feature can be obtained. After the deletion, taking the next tag value as the current tag value, and returning to execute the step S201;

s203: if the judgment result in the step S201 is no, judging whether the first label value corresponding to the current label value is the same as the second label value of other label values, if so, deleting the current label value; if not, executing S204;

s204: judging whether a second label value corresponding to a current label value is the same as a first label value of other label values or not according to the current label value;

s205: if the judgment result of the step S204 is yes, deleting the current tag value;

s206: if the judgment result in the step S204 is no, judging whether the second tag value corresponding to the current tag value is the same as the second tag values of other tag values, if so, deleting the current tag value, taking the next tag value as the current tag value after deleting, and returning to the step S201, otherwise, reserving the current tag value.

S103: acquiring a historical flow data set, extracting second data features and second user features of each flow data in the historical flow data set, and classifying the historical flow data according to the second data features and the second user features to obtain a historical flow data cluster;

specifically, for each piece of flow data, marking the historical flow data by taking the second data characteristic and the second user characteristic as tag values to obtain marked historical flow data;

It can be understood that the method for marking and clustering the historical traffic data is the same as the process in step S102, and the embodiment of the present invention will not be described herein.

S104: calculating the order success rate corresponding to each historical flow data cluster respectively; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.

Firstly, carrying out similarity calculation on a first data characteristic of each flow data in each current flow data cluster and a second data characteristic of each historical flow data cluster aiming at each current flow data cluster to obtain a plurality of similarity values, and screening out a first target similarity with similarity larger than a set threshold value from the plurality of similarity values; the historical traffic clusters corresponding to the first target similarity have a first success rate.

Similarly, performing similarity calculation on the first user characteristic of each flow data in the current flow data cluster and the second user characteristic of each historical flow data cluster to obtain a plurality of similarity values, and screening a second target similarity with similarity larger than a set threshold value from the plurality of similarity values; the historical traffic clusters corresponding to the second target similarity have a second success rate.

And taking the set of the first success rate and the second success rate as a target success rate. And then multiplying the target yield by the number of the flow data in the current flow data cluster to obtain the yield corresponding to the current flow data cluster. And then, summing the corresponding traffic volumes of each current flow data cluster to obtain the total traffic volume.

In a specific implementation manner of the embodiment of the present invention, since the number of values included in the first success rate may be more than one, and similarly, the number of values included in the second success rate may be more than one, the step of taking the set of the first success rate and the second success rate as the target success rate may include:

calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,

Further, after the period to be predicted has elapsed, the flow data in the period to be predicted may be added to the historical flow data, and then step S101 may be performed for the next period to be predicted.

Example 2

Corresponding to embodiment 1 of the present invention, embodiment 2 of the present invention provides a data prediction system applied to electronic commerce.

Fig. 3 is a schematic structural diagram of a data prediction system applied to electronic commerce according to an embodiment of the present invention, where, as shown in fig. 3, the system includes:

the obtaining module 201 is configured to obtain a flow data set in a period to be predicted, and extract a first data feature and a first user feature of each piece of flow data in the flow data set, where the first data feature includes: one or a combination of an IP address, a login device type, a login port type, a data type; the first user characteristic comprises: one or a combination of user identification information and user behavior information;

the classification module 202 is configured to perform classification processing on each piece of traffic data according to the first data feature and the first user feature, so as to obtain a current traffic data cluster;

the calculating module 203 is configured to calculate an order success rate corresponding to each historical traffic data cluster; and screening target traffic rates with the corresponding similarity larger than a set threshold value from the traffic rates of each order according to the similarity of the first data feature and the second data feature and the similarity of the first user feature and the second user feature, calculating corresponding traffic volumes according to products of the target traffic rates and the corresponding current traffic data clusters, and calculating total traffic volumes according to sum of the traffic volumes respectively corresponding to the current traffic data clusters.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A data prediction method applied to electronic commerce, the method comprising:

2. The method for predicting data applied to electronic commerce according to claim 1, wherein the acquiring the set of traffic data in the period to be predicted comprises:

3. The method for predicting data applied to electronic commerce according to claim 1, wherein the classifying each piece of traffic data according to the first data feature and the first user feature to obtain a current traffic data cluster includes:

4. A data prediction method applied to electronic commerce according to claim 3, wherein the obtaining the number of kinds of tag values corresponding to the first data feature comprises:

5. The method for predicting data to be applied to electronic commerce according to claim 4, wherein the stitching the first data feature and the first user feature into the tag value comprises:

the first tag value and the second tag value are collected as tag values.

6. The method for predicting data to be applied to electronic commerce according to claim 4, wherein the performing the de-duplication process on the tag value comprises:

if yes, deleting the current tag value;

7. The method for predicting data applied to electronic commerce according to claim 1, wherein the classifying the historical traffic data according to the second data feature and the second user feature to obtain the historical traffic data cluster comprises:

8. The method for predicting data applied to electronic commerce according to claim 1, wherein the step of screening out target achievement rates with a corresponding similarity greater than a set threshold from the various order achievement rates according to the similarity between the first data feature and the second data feature and the similarity between the first user feature and the second user feature comprises:

9. The method for predicting data to be used in electronic commerce according to claim 8, wherein the step of setting the set of the first and second rates as the target rate of the transaction comprises:

calculating a target yield using the formula, t=w1 a+w2 b+w3 c, wherein,

10. A data prediction system for use in electronic commerce, the system comprising: