CN117349344B - Intelligent product sales data acquisition method and system based on big data - Google Patents

Intelligent product sales data acquisition method and system based on big data Download PDF

Info

Publication number
CN117349344B
CN117349344B CN202311369328.6A CN202311369328A CN117349344B CN 117349344 B CN117349344 B CN 117349344B CN 202311369328 A CN202311369328 A CN 202311369328A CN 117349344 B CN117349344 B CN 117349344B
Authority
CN
China
Prior art keywords
subset group
training
centers
preset
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311369328.6A
Other languages
Chinese (zh)
Other versions
CN117349344A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Oupai Creative Home Design Co ltd
Original Assignee
Guangzhou Oupai Creative Home Design Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Oupai Creative Home Design Co ltd filed Critical Guangzhou Oupai Creative Home Design Co ltd
Priority to CN202311369328.6A priority Critical patent/CN117349344B/en
Publication of CN117349344A publication Critical patent/CN117349344A/en
Application granted granted Critical
Publication of CN117349344B publication Critical patent/CN117349344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

According to the intelligent product sales data collection method and system based on big data, after the product statistical types of the product sales data in one statistical period are obtained through analysis of the sales data analysis model, the product statistical types corresponding to the statistical periods are comprehensively counted to generate the product sales report, intelligent analysis is conducted through the sales data analysis model, accurate product statistical types are obtained efficiently, and operation basis is provided for subsequent product sales. In the training link of the sales data analysis model, the universality of the target sales data analysis model after training on unrestricted disturbance data sets can be improved, an excellent model can be obtained without training massive training data, the sample size requirement in the training process is reduced, the requirement on hardware calculation force is reduced, the problem of high cost of the model is relieved, and meanwhile the training speed is improved.

Description

Intelligent product sales data acquisition method and system based on big data
Technical Field
The application relates to the field of data processing, and in particular relates to a method and a system for intelligently collecting product sales data based on big data.
Background
Sales data collection refers to supporting business decisions and analysis by collecting and recording sales-related information and data to understand sales activities and trends. Through the analysis activity when sales data is gathered, can help the enterprise to carry out market understanding, learn market demand to help sales prediction and demand planning, through the analysis excavation to historical data, predict future sales trend and demand change, this helps the enterprise to formulate reasonable production plan, inventory management and supply chain tactics to satisfy market demand and avoid the condition of supply and demand or supply and demand. In addition, detailed information of the purchasing behavior and the consumption habit of the client can be provided, so that enterprises can be helped to know the demands and the preferences of the client. By analyzing sales data, an enterprise can identify important customer groups, discover potential cross-sales opportunities, and formulate personalized marketing strategies to promote customer satisfaction and loyalty. Along with the development of artificial intelligence, the sales data is automatically analyzed and decided through a machine learning model, so that enterprises can be helped to intelligently make product sales prediction decisions, and how to ensure the identification reliability of the machine learning model is a technical problem to be considered, wherein the reasonable training of the machine learning model is involved.
Disclosure of Invention
In view of this, the embodiments of the present application at least provide a method and a system for intelligent collection of product sales data based on big data.
According to an aspect of the embodiments of the present application, there is provided a method for intelligently collecting product sales data based on big data, which is applied to a server, where the server is communicatively connected with at least one sales terminal, and the method includes: obtaining product sales data generated in a target statistical period sent by at least one sales terminal to obtain a product sales data set; loading the product sales data set into a preset sales data analysis model, and analyzing the product sales data set based on the sales data analysis model to obtain the product statistical type of the target statistical period; counting the product statistical types corresponding to the plurality of statistical periods respectively, and generating a product sales report; the training process of the sales data analysis model comprises the following steps: extracting data characterization vectors of a plurality of sales training data sets marked with the same product classification indication information in a sales training database through a target sales data analysis model to obtain training characterization vector clusters under the same product classification indication information, wherein the sales training database comprises sales training data sets marked with the plurality of product classification indication information; pre-deploying a plurality of preset subset group centers for the training characterization vector cluster, and determining preset subset group centers corresponding to a plurality of training characterization vectors in the training characterization vector cluster; updating the preset subset group centers through the commonality measurement results between the training characterization vectors and the corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information; determining model training errors through target sub-cluster centers under various product classification indication information and training characterization vectors corresponding to the target sub-cluster centers, and training the target sales data analysis model through the model training errors.
According to an example of the embodiment of the present application, the updating the plurality of preset subset group centers according to the common measurement result between the plurality of training token vectors and the respective corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information includes: judging whether cluster appearance vectors are contained in the training characterization vectors or not according to the common measurement results between the training characterization vectors and the centers of the preset subsets corresponding to the training characterization vectors, wherein the common measurement results between the cluster appearance vectors and the centers of the preset subsets belonging to the training characterization vectors are smaller than a first critical result; if the plurality of training characterization vectors contain cluster appearance vectors, adding a preset subset group center through the cluster appearance vectors, and updating the cluster appearance vectors to belong to the added preset subset group center; the target subset group center comprises an added preset subset group center and a plurality of preset subset group centers which are deployed in advance.
According to an example of an embodiment of the present application, the determining, according to a common measurement result between the plurality of training token vectors and respective corresponding preset subset group centers, whether the plurality of training token vectors includes a cluster token vector includes: for any preset subset group center, determining a first critical result corresponding to the preset subset group center through the mean value and the average deviation of the commonality measurement result between the preset subset group center and the training characterization vector belonging to the preset subset group center; judging whether training characterization vectors belonging to the preset subset group center contain training characterization vectors with the commonality measurement result smaller than the first critical result with the preset subset group center or not; and determining a training characterization vector which contains a common measurement result with the preset subset group center and is smaller than the first critical result as an extraclusterity expression vector exclusive to the preset subset group center.
According to an example of the embodiment of the present application, the updating the plurality of preset subset group centers according to the common measurement result between the plurality of training token vectors and the respective corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information includes: judging whether the preset subset group centers contain free subset group centers or not according to the common measurement results between the training characterization vectors and the corresponding preset subset group centers, wherein the average value of the common measurement results between the training characterization vectors clustered by the free subset group centers and the free subset group centers is smaller than or equal to a second critical result; if the plurality of preset subset group centers comprise free subset group centers, cleaning the free subset group centers and training characterization vectors belonging to the free subset group centers to obtain remaining preset subset group centers and training characterization vectors belonging to the remaining preset subset group centers; wherein the target subset group center comprises the remaining preset subset group center.
According to an example of an embodiment of the present application, the determining, according to a result of a common metric between the training token vectors and the respective corresponding preset subset group centers, whether the preset subset group centers include free subset group centers includes: for any preset subset group center, judging whether the average value of the commonality measurement results between the preset subset group center and the training characterization vectors belonging to the preset subset group center is smaller than or equal to a second critical result; and if the average value of the commonality measurement results between the preset subset group center and the training characterization vector belonging to the preset subset group center is smaller than or equal to the second critical result, determining the preset subset group center as a free subset group center.
According to an example of the embodiment of the present application, the updating the plurality of preset subset group centers according to the common measurement result between the plurality of training token vectors and the respective corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information includes: determining at least one group of sub-cluster centers to be integrated in the plurality of preset sub-cluster centers through a commonality measurement result between the plurality of training characterization vectors and the corresponding preset sub-cluster centers, wherein each group of sub-cluster centers to be integrated comprises at least two matched preset sub-cluster centers; respectively integrating at least one group of sub-cluster centers to be integrated to obtain at least one integrated sub-cluster center, and updating training characterization vectors corresponding to the sub-cluster centers to be integrated to belong to each integrated sub-cluster center after integration; wherein the target subset group center comprises the integrated subset group center.
According to an example of an embodiment of the present application, the determining, according to a result of a common metric between the training token vectors and respective preset subset group centers, at least one set of subset group centers to be integrated from the preset subset group centers includes: determining respective third critical results of the preset subset group centers by means of average and average dispersion of common measurement results between the preset subset group centers and training characterization vectors corresponding to the preset subset group centers; for any preset subset group center, judging whether a commonality measurement result between the preset subset group center and other preset subset group centers in the preset subset group centers is not smaller than a fourth critical result, wherein the fourth critical result is the maximum value of third critical results corresponding to the preset subset group center and third critical results corresponding to the other preset subset group centers; if the commonality measurement result between the preset subset group center and the other preset subset group centers is not smaller than the maximum value, determining that the other preset subset group centers are matched with the preset subset group center; and determining the preset subset group center and at least one other preset subset group center matched with the preset subset group center as a group of subset group centers to be integrated.
According to an example of an embodiment of the present application, the determining the model training error through the target sub-cluster center under the multiple product classification indication information and the training characterization vector corresponding to each target sub-cluster center includes: for each training characterization vector of any target subset group center cluster, determining a sub-error corresponding to the training characterization vector through a commonality measurement result between the training characterization vector and the target subset group center of the subordinate target and a commonality measurement result between the training characterization vector and other target subset group centers; the other target subset group centers comprise target subset group centers, except for the target subset group centers to which the training characterization vector belongs, of a plurality of target subset group centers under the plurality of product classification indication information, and the common measurement result with the training characterization vector is smaller than or equal to a fifth critical result; and determining the model training errors through the sub errors corresponding to the training characterization vectors corresponding to the centers of the target sub clusters.
According to an example of an embodiment of the present application, the determining a preset subset group center corresponding to each of a plurality of training token vectors in the training token vector cluster includes: for any training characterization vector in the training characterization vector cluster, determining a commonality measurement result between the training characterization vector and each preset subset group center, and determining the preset subset group center corresponding to the maximum commonality measurement result as the preset subset group center to which the training characterization vector belongs.
According to another aspect of the embodiments of the present application, there is provided an intelligent product sales data collection system, including a server and at least one sales terminal communicatively connected to the server, the server including: one or more processors; and one or more memories, wherein the memories have stored therein computer readable code, which when executed by the one or more processors, causes the one or more processors to perform the method described above.
The beneficial effects of this application include at least: after the product statistical types of the product sales data in one statistical period are obtained by analyzing the sales data analysis model, the product statistical types corresponding to the statistical periods are comprehensively counted to generate a product sales report, wherein the sales data analysis model is utilized for intelligent analysis, the accurate product statistical types are obtained efficiently, and the product statistical types provide operation basis for subsequent product sales. In the training link of the sales data analysis model, a plurality of preset subset group centers are deployed in advance for training characterization vector clusters under the same product classification indication information, and the preset subset group centers corresponding to the training characterization vectors in the training characterization vector clusters are determined, so that a plurality of sub-classifications are deployed in advance for the training characterization vectors under the same product classification indication information in comparison to contain disturbance characteristics possibly existing under the same product classification indication information; updating the preset subset group centers through the common measurement results between the training characterization vectors and the corresponding preset subset group centers, so that the obtained at least one target subset group center under the same product classification indication information can be more suitable for the space scattering condition of the training characterization vectors under the product classification indication information, in other words, the training characterization vectors under the same product classification indication information can be clustered to the more accurate target subset group center, and the indication information disorder caused by the unrestricted disturbance features possibly existing under different product classification indication information is reduced; the target sales data analysis model is trained through model training errors determined by the target subset group center and training characterization vectors belonging to the target subset group center, so that when the target sales data analysis model is trained through a sales training database possibly provided with a disturbance data set, universality of the trained target sales data analysis model on the unrestricted disturbance data set is improved, an excellent model can be obtained without training massive training data, sample size requirements in the training process are reduced, requirements on hardware calculation force are reduced, the problem of high cost of the model is solved, and meanwhile training speed is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the present application.
Drawings
The foregoing and other objects, features, and advantages of the embodiments of the application will become more apparent from the following more particular description of the embodiments of the application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a schematic diagram of an intelligent sales data acquisition system according to an embodiment of the present application;
fig. 2 is a schematic implementation flow chart of a product sales data intelligent collection method based on big data provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a training process of a sales data analysis model according to an embodiment of the present application;
fig. 4 is a schematic diagram of a composition structure of a sales data acquisition device according to an embodiment of the present application;
fig. 5 is a schematic hardware entity diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are intended to be within the scope of the present application, based on the embodiments herein.
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and examples, which should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a specific ordering of objects, it being understood that the "first/second/third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the present application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the present application only and is not intended to be limiting of the present application.
The intelligent acquisition method for the sales data of the product based on the big data can be applied to an intelligent acquisition system for the sales data shown in fig. 1. The sales data intelligent acquisition system comprises a server and a plurality of sales terminals in communication connection with the server, wherein the sales terminals 102 are in communication with the server 104 through a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The sales data may be stored in a local storage of the sales terminal 102, or may be stored in a data storage system or a cloud storage associated with the server 104, and when the sales data analysis is required, the server 104 may obtain the sales data from the local storage of the sales terminal 102, or the data storage system, or the cloud storage. The sales terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
The intelligent acquisition method for the product sales data based on the big data provided by the embodiment of the application is applied to a server, please refer to fig. 2, and specifically comprises the following steps: operation S100: and obtaining product sales data generated in a target statistical period sent by at least one sales terminal to obtain a product sales data set.
In this embodiment of the present application, the sales terminal may be a sales terminal that distributes off-line stores, such as a computer device in a store, and during the product selling process, a staff may record sales conditions through the sales terminal, or may generate consumer data, such as information that is filled in advance or in real time by a consumer during member sales, so as to generate corresponding sales data, and upload the corresponding sales data to a server connected to the consumer. Sales conditions include, but are not limited to, sales transaction data, customer data, inventory data, channel data, market data, etc., such as data including, for example, sales date, sales amount, sales quantity, sales product category, sales location, payment method, etc.; customer data such as personal information, purchase history, preference, behavior and the like of the consumer, which are obtained under the premise of meeting laws and regulations and agreeing with the consumer, can be acquired by a Customer Relationship Management (CRM) system, a subscription form, a questionnaire and the like; the channel data can be sales, market share, channel cost and other data of the channel, and can be collected through sales reports, advertising platforms, partner systems and the like. In the method provided by the application, the sales data in the corresponding period sent by at least one sales terminal is periodically collected according to the preset collection period, and then the sales data sets of the products are integrated. The target statistical period is the statistical period to be analyzed. For ease of subsequent analysis, discrete data in the sales data may be encoded as vectors by one-hot encoding.
Operation S200: and loading the product sales data set into a preset sales data analysis model, and analyzing the product sales data set based on the sales data analysis model to obtain the product statistical type of the target statistical period.
The sales data analysis model is a pre-trained machine learning model, and the collected sales big data (namely a product sales data set) is analyzed through the sales data analysis model to obtain the product statistical type of the target statistical period. The product statistics type is the type of products with high heat or good sales trend contained in the product sales data set, such as spliced integrated boards and solid wood boards.
Operation S300: and counting the product statistical types corresponding to the plurality of statistical periods respectively, and generating a product sales report.
After the product statistics type is obtained for the product sales data set of each statistics period, each product statistics type is arranged, so that a product sales report can be obtained, for example, the change trend of the hot sales products can be intuitively reflected in the product sales report based on the product statistics type, and other contents of the product sales report can be customized according to actual conditions, such as the condition of customer portrait concentration. The method provided by the embodiment of the application is only used for obtaining the product statistics types, and it can be understood that the product statistics types in the statistics period can be automatically, quickly and accurately obtained by the embodiment of the application, and the method depends on a sales data analysis model with excellent performance after training.
It can be understood that the training effect of the sales data analysis model directly affects the accuracy of identifying the product statistical type, and in this embodiment of the present application, the training process of the sales data analysis model please refer to fig. 3, which includes the following operations: operation S10: and extracting data characterization vectors from a plurality of sales training data sets marked with the same product classification indication information in a sales training database through a target sales data analysis model to obtain training characterization vector clusters under the same product classification indication information.
The sales training database is a database containing training data, and comprises a sales training data set marked with various product classification indicating information, wherein the product classification indicating information is a mark of product classification.
It is easy to understand that in order to identify the product statistics type in the sales data set, the target sales data analysis model sets a plurality of product types in advance, determines the estimated probability of the sales data set corresponding to each product type, and determines the target product type corresponding to the sales data set according to the estimated probability.
As one embodiment, the target sales data analysis model at least comprises a data characterization vector extraction network and a classification mapping network, wherein the data characterization vector extraction network is used for extracting a sales data set characterization vector of a sales data set to be identified, and the sales data set characterization vector is a feature vector for characterizing sales data feature information; the classification mapping network is used for determining the probability that the characterization vector of the sales data set corresponds to each product type respectively, wherein the similarity degree between two vectors obtained by measurement can be obtained by calculating the distance (such as Euclidean distance) between the two vectors through the common measurement result between the characterization vector of the sales data set and the characterization vector of the cluster center corresponding to each product type, the smaller the vector distance is, the bigger the common measurement result between the two vectors is represented, namely the higher the similarity is, the more suitable for being classified into one cluster, the cluster center is the center of the cluster, the probability that the characterization vector of the sales data set corresponds to each product type respectively is determined, and the most probable product type is taken as the product statistical type of the sales data set. Of course, in other embodiments, the model architecture of the target sales data analysis model may include more demand-based networks, such as fully-connected networks, convolutional networks, etc., and the above-mentioned classification mapping network is essentially a classifier, which may employ a general multi-classification output structure (e.g., an activation function softmax). The training token vector cluster under the same product classification indication information is a set of a plurality of training token vectors of a plurality of sales training data sets under the same product classification indication information. The target sales data analysis model can perform data characterization vector extraction on a plurality of sales training data sets comprising a plurality of product classification indication information to obtain training characterization vector clusters under the plurality of product classification indication information, wherein the training characterization vector clusters are clusters formed by the characteristics of the extracted sales training data sets.
Operation S20: pre-deploying a plurality of preset subset group centers for the training token vector cluster, and determining the preset subset group centers corresponding to the training token vectors in the training token vector cluster.
It will be appreciated that the sales training database includes sales training data sets of multiple product classification indication information, and then the number of preset subset group centers pre-deployed by the target sales data analysis model may be based on the super parameter deployment, for example, if there are 50 sales training data sets of product types, and 5 preset subset group centers are generated by deploying each product type, then the target sales data analysis model may arbitrarily generate 250 preset subset group centers, in other words, each product classification indication information may correspond to 5 preset subset group centers pre-deployed. The cluster center is a classified class center, and the subset cluster center is a plurality of class centers arranged in a training characterization vector cluster. The preset subset group center may be an array, for example, a one-dimensional array (i.e., vector), a two-dimensional array (i.e., matrix), and a multi-dimensional array (i.e., tensor), which is not limited in detail, and may be arbitrarily generated by the target sales data analysis model, and the preset subset group center may be a parameter in the target sales data analysis model. The training characterization vector clusters are pre-deployed with a plurality of preset subset group centers, then the preset subset group centers corresponding to the training characterization vectors in the training characterization vector clusters are determined, simple classification can be achieved, and then the preset subset group centers are updated to adapt to training data of different disturbance classifications, so that the product statistical type classification is more accurate.
As an implementation manner, determining a preset subset group center corresponding to each of a plurality of training token vectors in a training token vector cluster specifically includes: for any training characterization vector in the training characterization vector cluster, determining a commonality measurement result between the training characterization vector and each preset subset group center, and taking the preset subset group center corresponding to the maximum commonality measurement result as the preset subset group center of the training characterization vector membership. In other words, by determining the common measurement result between each training token vector and each preset subset group center, the preset subset group center with the largest common measurement result between each training token vector is determined and used as the preset subset group center to which the training token vector belongs.
Operation S30: and updating the preset subset group centers through the common measurement results between the training characterization vectors and the corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information.
Because the sales data set with the disturbance in the sales training database may include various disturbance classifications (i.e., different product statistics types), the preset subset group centers pre-deployed in operation S30 may not be adapted to the disturbance classifications of the sales training database, and at this time, the preset subset group centers may be updated, for example, by updating the preset subset group centers under each product classification indication information in a manner of generating the subset group centers, integrating the subset group centers, cleaning the subset group centers, and the like, so as to obtain the target subset group centers under each product classification indication information.
For example, if the current sales training database has 4 sales training data sets of product types, in other words, sales training data sets marked with 4 product classification indication information, then adding 4 preset subset group centers for each product type, and updating the preset subset group centers under three product types respectively through operations such as subset group center generation, subset group center integration, subset group center cleaning, etc., so as to obtain N target subset group centers (N is equal to or greater than 1) under each product type, in other words, spontaneously distributing in the whole feature space; and then, the training characterization vector of the sales training data set is driven to the center of the corresponding (i.e. belonging to) target subset group, other target subset group centers with the training characterization vector having higher common measurement results are not focused, and the centers of other target subset groups with the training characterization vector having lower common measurement results are excluded (far away) from the center of the target subset group as training targets, and a target sales data analysis model is trained, so that the indication information disorder caused by the disturbance data set is processed.
As an implementation manner, updating the plurality of preset subset group centers through a common measurement result between the plurality of training characterization vectors and the respective corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information, which specifically comprises: determining whether training characterization vectors which are excluded from preset subset group centers belonging to membership are contained in sub-training characterization vector clusters corresponding to the preset subset group centers, whether training characterization vectors which are included in the preset subset group centers and belong to the same preset subset group center are all excluded from preset subset group centers of the preset subset group centers, and whether the preset subset group centers are included in at least two preset subset group centers with similar distances or not according to a common measurement result between the training characterization vectors and the preset subset group centers corresponding to the preset subset group centers; if training characterization vectors which are excluded from the centers of the subordinate preset subsets are included, in other words, if the actual product classification number in the training characterization vector cluster under the same product classification indication information is greater than the number of the centers of the preset subsets, a new preset subset center is established for the training characterization vectors which are excluded from the centers of the subordinate preset subsets so as to contain the training characterization vectors under the actual product classification number; if training token vectors belonging to the same preset subset group center are all excluded from the preset subset group center of the preset subset group center, the preset subset group center and the training token vector belonging to the preset subset group center are cleaned, in other words, if the commonality measurement results between the training token vector belonging to one preset subset group center and the preset subset group center are smaller, the preset subset group center is a free subset group center (i.e. a subset group center without meaning), and the free subset group center and the training token vector belonging to the free subset group center are cleaned; if the preset subset group center with the larger common measurement result is included, the preset subset group centers are integrated to finish merging, in other words, if the common measurement result of the preset subset group centers is large, it is indicated that the preset subset group centers are most likely to have the same product classification, and the training characterization vectors are clustered into the same product type through integration.
The updated target subset group center specifically includes at least one of some or all preset subset group centers pre-deployed in operation S20, preset subset group centers newly constructed in operation S30, and integrated preset subset group centers, and after updating the plurality of preset subset group centers under each product classification indication information, at least one target subset group center can be obtained under each product classification indication information.
Operation S40: determining model training errors through target sub-cluster centers under various product classification indication information and training characterization vectors corresponding to the target sub-cluster centers, and training a target sales data analysis model through the model training errors.
It can be understood that, through operations S10 to S30, a plurality of target sub-cluster centers under various product classification indication information and training characterization vectors corresponding to each target sub-cluster center can be obtained, so that the estimated type of each sales training data set output by the target sales data analysis model can be obtained, and since the sales training data set is marked with the product classification indication information, the model training error can be determined by a common measurement result between the training characterization vector of the sales training data set and the target sub-cluster center under the product classification indication information corresponding to the sales training data set, and a common measurement result between the training characterization vector of the sales training data set and other target sub-cluster centers under other product classification indication information. As one embodiment, determining a model training error by a plurality of target sub-cluster centers under a plurality of product classification indication information and training characterization vectors corresponding to the respective target sub-cluster centers includes: for any training characterization vector belonging to any target subset center, determining a commonality measurement result between the training characterization vector and the target subset center belonging to the target subset center and a commonality measurement result between the training characterization vector and other target subset centers; then determining a sub-error corresponding to the training characterization vector through a commonality measurement result between the training characterization vector and the center of the subordinate target subset group and a commonality measurement result between the training characterization vector and the centers of other target subset groups; and summing the sub-errors corresponding to the training characterization vectors corresponding to the centers of the target subsets to obtain model training errors. The error function (loss function) on which the sub-error corresponding to the training token vector depends is determined by the common measurement result between the training token vector and the center of the target subset group of the membership and the common measurement result between the training token vector and the centers of other target subset groups, and may be known various error functions, such as cross entropy functions.
Since in reality sales training data sets that may have the same product classification are marked as disturbance data sets of different product classification indication information, other target subset centers may have other target subset centers that have the same product classification as the current training token vector, and if the training token vector is rejected to all other target subset centers when determining the model training error, the training token vector may be rejected to other target subset centers that have the same product classification, then the other target subset centers that have higher commonality measurement results with the current training token vector are not focused when determining the model training error. As one embodiment, determining a model training error by a training characterization vector corresponding to a target subset center under multiple product classification indication information includes: for each training characterization vector of any target subset group center cluster (namely subordinate and covered), determining a sub-error corresponding to the training characterization vector according to a common measurement result between the training characterization vector and the target subset group center of the subordinate and a common measurement result between the training characterization vector and other target subset group centers, wherein the other target subset group centers comprise target subset group centers of which the common measurement result is smaller than or equal to a fifth critical result in a plurality of target subset group centers under various product classification indication information except the target subset group center of the subordinate of the training characterization vector; and determining model training errors through sub errors corresponding to the training characterization vectors corresponding to the centers of the target sub clusters. When the model training error is determined, the training characterization vector is driven to the center of the target subset group which is subordinate to the model training error, other target subset group centers which have higher commonality measurement results with the current training characterization vector are not focused, and the model training error is excluded from other target subset group centers which have lower commonality measurement results with the current training characterization vector, so that the disorder of indication information brought by a disturbance data set can be reduced, and the model with more accurate performance is obtained. The other target subset group centers include target subset group centers of which the common measurement result with the training characterization vector is smaller than or equal to a fifth critical result except the target subset group center of the training characterization vector under the multiple product classification indication information, namely, when determining the sub-error corresponding to one training characterization vector, the target subset group centers of which the common measurement result with the training characterization vector is larger than the fifth critical result are not focused any more, in other words, the other target subset group centers include target subset group centers of which the common measurement result with the training characterization vector is smaller than or equal to the fifth critical result.
If the commonality measurement result between the training characterization vector and the center of one other target subset group is larger than the fifth critical result, the commonality measurement result between the training characterization vector and the center of the other target subset group is higher, and the center of the other target subset group is not focused; if the commonality measurement result between one training token vector and one other target subset center is less than or equal to the fifth threshold result, then the commonality measurement result between the training token vector and the other target subset center is lower, and then the other target subset center is maintained.
As an embodiment, the fifth critical result may be determined by the mean and average dispersion of the commonality metric result between the respective other target subset center and the training token vector corresponding to the respective other target subset center, in other words, each other target subset center corresponds to the respective fifth critical result. For example, the fifth critical result may be a sum obtained by adding a weighted value of the mean value and the average deviation, and determining the model training error by using the sub-error corresponding to each training characterization vector corresponding to each target sub-cluster center specifically includes: and adding or calculating the average value of the sub-errors corresponding to the training characterization vectors corresponding to the centers of the target subsets to obtain the model training errors.
As one embodiment, training the target sales data analysis model by model training errors specifically includes: and updating model-learnable variables (such as weights, biases and learning rates) of the target sales data analysis model based on the back propagation through model training errors, and performing cyclic operations S10-S40, wherein the cyclic operation is stopped when the model training errors are smaller than preset errors or the number of cyclic operations reaches the maximum number.
In the embodiment of the application, a plurality of preset subset group centers are deployed in advance for the training characterization vector cluster under the same product classification indication information, and the preset subset group centers corresponding to the training characterization vectors in the training characterization vector cluster are determined, which is equivalent to deploying a plurality of sub-classifications in advance for the training characterization vectors under the same product classification indication information so as to contain disturbance characteristics possibly existing under the same product classification indication information; updating the preset subset group centers through the common measurement results between the training characterization vectors and the corresponding preset subset group centers, so that the obtained at least one target subset group center under the same product classification indication information can be more suitable for the space scattering condition of the training characterization vectors under the product classification indication information, in other words, the training characterization vectors under the same product classification indication information can be clustered to the more accurate target subset group center, and the indication information disorder caused by various unrestricted (i.e. unconstrained) disturbance features possibly existing under different product classification indication information is reduced; the training error determined by the training characterization vector of the target subset group center and the training characterization vector belonging to the target subset group center is used for training the target sales data analysis model, so that the universality of the target sales data analysis model after training on an unlimited disturbance data set is improved, an excellent model can be obtained without training massive training data, the sample size requirement in the training process is reduced, the requirement on hardware calculation force is reduced, the high cost problem of the model is relieved, and the training speed is improved.
In the process of updating the plurality of preset subset group centers, if the actual product classification number in the training characterization vector cluster under the same product classification indication information is greater than the number of the preset subset group centers, a new preset subset group center can be established to accommodate other product classifications except the preset subset group center generated by pre-deployment. As an implementation manner, in operation S30, updating the plurality of preset subset group centers through the common measurement results between the plurality of training token vectors and the respective corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information, including: operation S31: judging whether the cluster appearance vectors are contained in the training characterization vectors or not according to the common measurement results between the training characterization vectors and the corresponding preset subset group centers, wherein the common measurement results between the cluster appearance vectors and the preset subset group centers are smaller than a first critical result.
Operation S32: if the plurality of training characterization vectors contain cluster characterization vectors, adding a preset subset group center through the cluster characterization vectors, and updating the cluster characterization vectors to belong to the added preset subset group center; the target subset group center comprises an added preset subset group center and a plurality of preset subset group centers which are deployed in advance.
In operation S31, the cluster appearance vector is an outlier characterization vector, and the commonality measurement result between the cluster appearance vector and the center of the preset subset group of the membership is smaller than the first critical result, which indicates that the cluster appearance vector is a training characterization vector that is excluded (far) from the center of the preset subset group of the membership, or that the cluster appearance vector is far from the center of the preset subset group of the membership. As an embodiment, the first critical result may be determined by the mean and average deviation of the commonality measurement result between the preset subset group center and the training token vector belonging to the preset subset group center, in other words, each preset subset group center corresponds to the respective first critical result, i.e. the threshold value. The first threshold result thus determined facilitates a more accurate subsequent determination of the clusterity appearance vector.
As an implementation manner, in operation S31, determining whether the plurality of training token vectors includes the cluster appearance vector according to a common measurement result between the plurality of training token vectors and respective corresponding preset subset group centers includes: for any preset subset group center, determining a first critical result corresponding to the preset subset group center through the average value and the average dispersion of the commonality measurement result between the preset subset group center and the training characterization vector belonging to the preset subset group center; judging whether training characterization vectors belonging to the preset subset group center contain training characterization vectors with the commonality measurement result smaller than a first critical result or not; and determining the training characterization vector which contains the commonality measurement result smaller than the first critical result with the preset subset group center as the cluster characterization vector exclusive to the preset subset group center. The cluster appearance vector in each preset subset center is thus accurately determined. Wherein determining the first critical result corresponding to the preset subset group center through the average value and the average dispersion of the commonality measurement result between the preset subset group center and the training characterization vector belonging to the preset subset group center, for example, specifically includes: subtracting the average value corresponding to the center of any preset subset group from the weighted value of the average dispersion to obtain a difference result, and determining the difference result as a first critical result corresponding to the center of the preset subset group.
If the common measurement result between one training token vector and the preset subset group center is smaller than the first critical result, the training token vector is actually most likely to be disturbance characteristics with different product classifications from the preset subset group center of the membership, in other words, the cluster token vector which is excluded from the preset subset group center of the membership is independently used as the training token vector, and it can be understood that the cluster token vector is skimmed from the preset subset group center, a new preset subset group center is established for the cluster token vector, so that the in-class confusion is reduced.
Wherein, it may be determined that there is not less than one cluster appearance vector through operation S31, and if operation S31 determines one cluster appearance vector, the cluster appearance vector is directly cleaned, in other words, a preset subset group center is not added; if it is determined in operation S31 that there are not less than two cluster appearance vectors, in operation S32, the average of the not less than two cluster appearance vectors is taken as the center of the added preset subset group.
According to the method and the device for classifying the target sales data analysis model, through the common measurement results between the training characterization vectors and the corresponding preset subset group centers, the cluster appearance vectors which are exclusive from the preset subset group centers of the membership in the training characterization vectors are determined, and the preset subset group centers are added for the cluster appearance vectors, so that other product classifications except the preset subset group center characterization product classification generated by pre-deployment can be contained, errors caused by in-class disturbance are reduced, and the feature characterization capability of the target sales data analysis model is improved.
It can be known that, when updating the preset subset group center, if the training token vector belonging to one preset subset group center is far from the preset subset group center, the preset subset group center is a free subset group center, and the free subset group center and the training token vector belonging to the free subset group center are cleaned at this time. As one embodiment, operation S30 updates the plurality of preset subset centers through a common measurement result between the plurality of training token vectors and the respective corresponding preset subset centers, to obtain at least one target subset center under the same product classification indication information, including: operation S33: and judging whether the preset subset group centers contain free subset group centers or not according to the common measurement results between the training characterization vectors and the corresponding preset subset group centers, wherein the average value of the common measurement results between the training characterization vectors clustered by the free subset group centers and the free subset group centers is smaller than or equal to a second critical result.
Operation S34: under the condition that the plurality of preset subset group centers comprise free subset group centers, cleaning the free subset group centers and training characterization vectors belonging to the free subset group centers to obtain remaining preset subset group centers and training characterization vectors belonging to the remaining preset subset group centers; wherein the target subset group center includes the remaining preset subset group centers.
If operations S31 and S32 have been performed before operation S32, in other words, the operations of generating the subset group center may have been performed before the cleaning of the subset group center, the plurality of preset subset group centers in operation S33 specifically includes the preset subset group centers added in operations S31 and S32 and the plurality of preset subset group centers previously deployed in operation S20; since the preset subset group centers added in operations S31 to S32 are determined based on the cluster appearance vector, the free subset group center is most likely to originate from the preset subset group center added in operations S31 and S32. If operations S31 and S32 are not performed before operation S33, in other words, no establishment of the subset group center is performed, the plurality of preset subset group centers in operation S33 specifically include the plurality of preset subset group centers pre-deployed in operation S12.
As one embodiment, operation S33 determines whether the plurality of preset subset centers include free subset centers according to a common measurement result between the plurality of training token vectors and the respective corresponding preset subset centers, including: for any preset subset group center, judging whether the average value of the commonality measurement results between the preset subset group center and the training characterization vector belonging to the preset subset group center is smaller than or equal to a second critical result; and if the average value of the commonality measurement results between the preset subset group center and the training characterization vector belonging to the preset subset group center is smaller than or equal to the second critical result, determining the preset subset group center as the free subset group center. The specific value of the second critical result may be set by itself, the average value of the commonality measurement result between the preset subset group center and the training characterization vector belonging to the preset subset group center may be set, the distribution density of the commonality measurement result between the training characterization vector belonging to the preset subset group center and the preset subset group center may be represented, the plurality of training characterization vectors corresponding to the same product classification may be generally distributed in a centralized manner in the feature space, if the average value of the commonality measurement result between the training characterization vector of one preset subset group center and the preset subset group center is smaller than or equal to the second critical result, it may be determined that the distribution of the training characterization vector of the preset subset group center and the training characterization vector belonging to the preset subset group center is not centralized, in other words, the training characterization vector of the preset subset group center is far from the preset subset group center, or the training characterization vector of the free subset group center may have different product classifications, or the sales training characterization vector corresponding to the training characterization vector may be weak, so that the training characterization vector may not accurately correspond to the matched subset group center, and the average value of the commonality measurement result between the training characterization vector of the preset subset group center and the training characterization vector of the preset subset group is smaller than or equal to the preset critical result of the second subset center may be determined as the free subset center.
Wherein, the operation S34 cleans the free subset center and the training token vector belonging to the free subset center, in other words, the model training error is determined without passing the free subset center and the training token vector belonging to the free subset center, and the remaining preset subset centers, in other words, the preset subset centers except the free subset center among the above preset subset centers. In the embodiment of the application, the free subset group centers in the preset subset group centers are determined through the common measurement results between the training characterization vectors and the corresponding preset subset group centers, the training characterization vectors clustered by the free subset group centers and the free subset group centers are cleaned, free disturbance features in the training characterization vectors can be not focused any more, chaotic information caused by disturbance information in the class is reduced, and the feature characterization capability of the target sales data analysis model is improved.
As one embodiment, the step S30 of determining at least one group of the subset centers to be integrated from the plurality of training token vectors and the corresponding preset subset centers according to the common measurement result includes: operation S35: and determining at least one group of sub-cluster centers to be integrated in the plurality of preset sub-cluster centers according to a common measurement result between the plurality of training characterization vectors and the corresponding preset sub-cluster centers, wherein each group of sub-cluster centers to be integrated comprises at least two matched (i.e. similar) preset sub-cluster centers.
Operation S36: and respectively integrating at least one group of sub-cluster centers to be integrated to obtain at least one integrated sub-cluster center, and updating training characterization vectors corresponding to the sub-cluster centers to be integrated to belong to each integrated sub-cluster center after integration. Wherein the target subset group center comprises an integrated subset group center.
If the two preset subset group centers are similar to each other, in other words, the distances are close, the two preset subset group centers are most likely to be preset subset group centers belonging to the same product category, and the two preset subset group centers can be integrated. Each product category indication information may include sales training data sets of one or more product categories, while sales training data sets of the same product category may be labeled as different product category indication information, the plurality of preset subset group centers may include at least one group of subset group centers to be integrated, each group of subset group centers to be integrated may include preset subset group centers under the current product category indication information, and may also have preset subset group centers under other product category indication information.
As an implementation manner, operation S35 may determine a common measurement result between every two preset subset group centers in the plurality of preset subset group centers, determine two preset subset group centers with a common measurement result greater than the fourth critical result as two matched preset subset group centers, and thus determine at least two matched preset subset group centers as a group of subset group centers to be integrated.
As one embodiment, determining at least one group of the subset centers to be integrated from the plurality of training token vectors and the respective corresponding preset subset centers according to the common measurement result in operation S35 includes: determining respective third critical results of the preset subset group centers by means of average and average dispersion of common measurement results between the preset subset group centers and training characterization vectors corresponding to the preset subset group centers; for any preset subset group center, judging whether a commonality measurement result between the preset subset group center and other preset subset group centers in the plurality of preset subset group centers is not smaller than a fourth critical result, wherein the fourth critical result is the maximum value of a third critical result corresponding to the preset subset group center and a third critical result corresponding to the other preset subset group centers; if the commonality measurement result between the preset subset group center and the other preset subset group centers is not smaller than the maximum value, determining that the other preset subset group centers are matched with the preset subset group center; and determining the preset subset group center and at least one other preset subset group center matched with the preset subset group center as a group of subset group centers to be integrated.
The determining, by means of the average value and the average dispersion of the common measurement results between each preset subset group center and the training characterization vector corresponding to each preset subset group center, the third critical result corresponding to each preset subset group center, for example, specifically includes: and adding the average value corresponding to the center of each preset subset group with the weighted value of the average dispersion to obtain an addition result, and taking the addition result as a third critical result corresponding to the center of each preset subset group. For any preset subset group center, judging whether a commonality measurement result between the preset subset group center and other preset subset group centers in the preset subset group centers is not smaller than a fourth critical result, wherein the fourth critical result is the maximum value of third critical results corresponding to the preset subset group center and the third critical results corresponding to the other preset subset group centers, namely judging whether one preset subset group center is close to the other preset subset group centers or not, if the commonality measurement result between the preset subset group center and one other preset subset group center is not smaller than the fourth critical result, the distance between the preset subset group center and the other preset subset group center is close, and at the moment, the preset subset group center and the other preset subset group center are likely to have the same product classification.
Other preset subset group centers include preset subset group centers other than the preset subset group center currently in use among the preset subset group centers. For any preset subset group center, the comparison can be performed to determine that at least one other preset subset group center matched with the preset subset group center is obtained.
In one embodiment, the step S36 of integrating at least one sub-cluster center to be integrated to obtain at least one integrated sub-cluster center includes: and for any group of the subset group centers to be integrated, taking the average value of at least two preset subset group centers included in the group of the subset group centers to be integrated as an integrated subset group center after the group of the subset group centers to be integrated are integrated.
In the embodiment of the application, the preset subset group centers belonging to the same product classification can be integrated, the inter-type confusion caused by inter-type disturbance data is reduced, the tightness degree of training characterization vectors under the same product classification is improved, the characteristic characterization capability of the target sales data analysis model is improved, and the universality and the identification accuracy of the trained target sales data analysis model are improved.
Based on the same inventive concept, the embodiment of the application also provides a sales data acquisition device for realizing the intelligent acquisition method of the product sales data based on big data. The implementation scheme of the solution provided by the device is similar to the implementation scheme recorded in the method, so the specific limitation in the embodiment of one or more sales data acquisition devices provided below can be referred to the limitation of the intelligent acquisition method for the sales data of the product based on big data hereinabove, and the description is omitted here.
In one embodiment, as shown in FIG. 4, there is provided a sales data acquisition apparatus 400 comprising: the data obtaining module 410 is configured to obtain product sales data generated in the target statistics period sent by the at least one sales terminal, so as to obtain a product sales data set; the model calling module 420 is configured to load the product sales data set into a preset sales data analysis model, and obtain a product statistical type of the target statistical period based on analysis of the sales data analysis model; the report generating module 430 is configured to count product statistics types corresponding to the plurality of statistics periods respectively, and generate a product sales report; a model training module 440 for training the sales data analysis model; the model training module 440 specifically includes: the feature mining module 441 is configured to perform data characterization vector extraction on a plurality of sales training data sets marked with the same product classification indication information in a sales training database through a target sales data analysis model, so as to obtain a training characterization vector cluster under the same product classification indication information, where the sales training database includes sales training data sets marked with multiple product classification indication information; the center deployment module 442 is configured to pre-deploy a plurality of preset subset group centers for the training token vector cluster, and determine preset subset group centers corresponding to a plurality of training token vectors in the training token vector cluster; the center updating module 443 is configured to update the plurality of preset subset group centers according to a common measurement result between the plurality of training characterization vectors and the respective corresponding preset subset group centers, so as to obtain at least one target subset group center under the same product classification indication information; the error determining module 444 is configured to determine a model training error by using the target sub-cluster centers under the multiple product classification indication information and training characterization vectors corresponding to the target sub-cluster centers, and train the target sales data analysis model by using the model training error.
The respective modules in the tag processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a server is provided, the internal structure of which may be as shown in FIG. 5. The server includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the server is configured to provide computing and control capabilities. The memory of the server includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the server is used for storing data including sales data and the like. The input/output interface of the server is used for exchanging information between the processor and the external device. The communication interface of the server is used for communicating with an external terminal through network connection. The computer program, when executed by the processor, implements a method for intelligent collection of product sales data based on big data.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the server to which the present application applies, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, there is also provided a server including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method embodiments described above when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the object information (including, but not limited to, device information, corresponding personal information, etc. of the object) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the object or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (5)

1. The intelligent product sales data collection method based on big data is characterized by being applied to a server, wherein the server is in communication connection with at least one sales terminal, and the method comprises the following steps:
obtaining product sales data generated in a target statistical period sent by at least one sales terminal to obtain a product sales data set;
loading the product sales data set into a preset sales data analysis model, and analyzing the product sales data set based on the sales data analysis model to obtain the product statistical type of the target statistical period;
Counting the product statistical types corresponding to the plurality of statistical periods respectively, and generating a product sales report;
the training process of the sales data analysis model comprises the following steps:
extracting data characterization vectors of a plurality of sales training data sets marked with the same product classification indication information in a sales training database through a target sales data analysis model to obtain training characterization vector clusters under the same product classification indication information, wherein the sales training database comprises sales training data sets marked with the plurality of product classification indication information;
pre-deploying a plurality of preset subset group centers for the training characterization vector cluster, and determining preset subset group centers corresponding to a plurality of training characterization vectors in the training characterization vector cluster;
updating the preset subset group centers through the commonality measurement results between the training characterization vectors and the corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information;
determining model training errors through target sub-cluster centers under various product classification indication information and training characterization vectors corresponding to the target sub-cluster centers, and training the target sales data analysis model through the model training errors;
The updating of the plurality of preset subset group centers to obtain at least one target subset group center under the same product classification indication information through the common measurement result between the plurality of training characterization vectors and the corresponding preset subset group centers comprises the following steps:
judging whether cluster appearance vectors are contained in the training characterization vectors or not according to the common measurement results between the training characterization vectors and the centers of the preset subsets corresponding to the training characterization vectors, wherein the common measurement results between the cluster appearance vectors and the centers of the preset subsets belonging to the training characterization vectors are smaller than a first critical result;
if the plurality of training characterization vectors contain cluster appearance vectors, adding a preset subset group center through the cluster appearance vectors, and updating the cluster appearance vectors to belong to the added preset subset group center; the target subset group center comprises an added preset subset group center and a plurality of preset subset group centers which are deployed in advance;
updating the plurality of preset subset group centers according to the common measurement result between the plurality of training characterization vectors and the corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information, wherein the method comprises the following steps:
Judging whether the preset subset group centers contain free subset group centers or not according to the common measurement results between the training characterization vectors and the corresponding preset subset group centers, wherein the average value of the common measurement results between the training characterization vectors clustered by the free subset group centers and the free subset group centers is smaller than or equal to a second critical result;
if the plurality of preset subset group centers comprise free subset group centers, cleaning the free subset group centers and training characterization vectors belonging to the free subset group centers to obtain remaining preset subset group centers and training characterization vectors belonging to the remaining preset subset group centers; wherein the target subset group center comprises the remaining preset subset group center;
the determining whether the plurality of preset subset group centers include free subset group centers according to the common measurement result between the plurality of training characterization vectors and the corresponding preset subset group centers comprises:
for any preset subset group center, judging whether the average value of the commonality measurement results between the preset subset group center and the training characterization vectors belonging to the preset subset group center is smaller than or equal to a second critical result;
If the average value of the commonality measurement results between the preset subset group center and the training characterization vectors belonging to the preset subset group center is smaller than or equal to the second critical result, determining that the preset subset group center is a free subset group center;
updating the plurality of preset subset group centers according to the common measurement result between the plurality of training characterization vectors and the corresponding preset subset group centers to obtain at least one target subset group center under the same product classification indication information, wherein the method comprises the following steps:
determining at least one group of sub-cluster centers to be integrated in the plurality of preset sub-cluster centers through a commonality measurement result between the plurality of training characterization vectors and the corresponding preset sub-cluster centers, wherein each group of sub-cluster centers to be integrated comprises at least two matched preset sub-cluster centers;
respectively integrating at least one group of sub-cluster centers to be integrated to obtain at least one integrated sub-cluster center, and updating training characterization vectors corresponding to the sub-cluster centers to be integrated to belong to each integrated sub-cluster center after integration; wherein the target subset group center comprises the integrated subset group center;
The determining the preset subset group center corresponding to each of the training characterization vectors in the training characterization vector cluster includes:
for any training characterization vector in the training characterization vector cluster, determining a commonality measurement result between the training characterization vector and each preset subset group center, and determining the preset subset group center corresponding to the maximum commonality measurement result as the preset subset group center to which the training characterization vector belongs.
2. The method of claim 1, wherein the determining whether the plurality of training token vectors include cluster token vectors based on a common metric result between the plurality of training token vectors and respective corresponding preset subset centers comprises:
for any preset subset group center, determining a first critical result corresponding to the preset subset group center through the mean value and the average deviation of the commonality measurement result between the preset subset group center and the training characterization vector belonging to the preset subset group center;
judging whether training characterization vectors belonging to the preset subset group center contain training characterization vectors with the commonality measurement result smaller than the first critical result with the preset subset group center or not;
And determining a training characterization vector which contains a common measurement result with the preset subset group center and is smaller than the first critical result as an extraclusterity expression vector exclusive to the preset subset group center.
3. The method of claim 1, wherein determining not less than a set of subset group centers to be integrated from the plurality of training token vectors and respective corresponding preset subset group centers based on a commonality metric result comprises:
determining respective third critical results of the preset subset group centers by means of average and average dispersion of common measurement results between the preset subset group centers and training characterization vectors corresponding to the preset subset group centers;
for any preset subset group center, judging whether a commonality measurement result between the preset subset group center and other preset subset group centers in the preset subset group centers is not smaller than a fourth critical result, wherein the fourth critical result is the maximum value of third critical results corresponding to the preset subset group center and third critical results corresponding to the other preset subset group centers;
If the commonality measurement result between the preset subset group center and the other preset subset group centers is not smaller than the maximum value, determining that the other preset subset group centers are matched with the preset subset group center;
and determining the preset subset group center and at least one other preset subset group center matched with the preset subset group center as a group of subset group centers to be integrated.
4. The method of claim 3, wherein determining model training errors from the target sub-cluster centers under the multiple product classification indication information and training characterization vectors corresponding to the respective target sub-cluster centers comprises:
for each training characterization vector of any target subset group center cluster, determining a sub-error corresponding to the training characterization vector through a commonality measurement result between the training characterization vector and the target subset group center of the subordinate target and a commonality measurement result between the training characterization vector and other target subset group centers; the other target subset group centers comprise target subset group centers, except for the target subset group centers to which the training characterization vector belongs, of a plurality of target subset group centers under the plurality of product classification indication information, and the common measurement result with the training characterization vector is smaller than or equal to a fifth critical result;
And determining the model training errors through the sub errors corresponding to the training characterization vectors corresponding to the centers of the target sub clusters.
5. The intelligent acquisition system for the product sales data is characterized by comprising a server and at least one sales terminal which is in communication connection with the server, wherein the server comprises:
one or more processors;
and one or more memories, wherein the memories have stored therein computer readable code, which, when executed by the one or more processors, causes the one or more processors to perform the method of any of claims 1-4.
CN202311369328.6A 2023-10-23 2023-10-23 Intelligent product sales data acquisition method and system based on big data Active CN117349344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311369328.6A CN117349344B (en) 2023-10-23 2023-10-23 Intelligent product sales data acquisition method and system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311369328.6A CN117349344B (en) 2023-10-23 2023-10-23 Intelligent product sales data acquisition method and system based on big data

Publications (2)

Publication Number Publication Date
CN117349344A CN117349344A (en) 2024-01-05
CN117349344B true CN117349344B (en) 2024-03-05

Family

ID=89368876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311369328.6A Active CN117349344B (en) 2023-10-23 2023-10-23 Intelligent product sales data acquisition method and system based on big data

Country Status (1)

Country Link
CN (1) CN117349344B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740155A (en) * 2018-12-27 2019-05-10 广州云趣信息科技有限公司 A kind of customer service system artificial intelligence quality inspection rule self concludes the method and system of model
CN111178624A (en) * 2019-12-26 2020-05-19 浙江大学 Method for predicting new product demand
CN113642029A (en) * 2021-10-12 2021-11-12 华中科技大学 Method and system for measuring correlation between data sample and model decision boundary
CN113723985A (en) * 2021-03-04 2021-11-30 京东城市(北京)数字科技有限公司 Training method and device for sales prediction model, electronic equipment and storage medium
CN114048290A (en) * 2021-11-22 2022-02-15 鼎富智能科技有限公司 Text classification method and device
CN114298122A (en) * 2021-10-22 2022-04-08 腾讯科技(深圳)有限公司 Data classification method, device, equipment, storage medium and computer program product
CN115018552A (en) * 2022-06-28 2022-09-06 中国科学技术大学 Method for determining click rate of product
CN115186764A (en) * 2022-08-03 2022-10-14 腾讯科技(北京)有限公司 Data processing method and device, electronic equipment and storage medium
WO2023185539A1 (en) * 2022-03-28 2023-10-05 华为技术有限公司 Machine learning model training method, service data processing method, apparatuses, and systems

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740155A (en) * 2018-12-27 2019-05-10 广州云趣信息科技有限公司 A kind of customer service system artificial intelligence quality inspection rule self concludes the method and system of model
CN111178624A (en) * 2019-12-26 2020-05-19 浙江大学 Method for predicting new product demand
CN113723985A (en) * 2021-03-04 2021-11-30 京东城市(北京)数字科技有限公司 Training method and device for sales prediction model, electronic equipment and storage medium
CN113642029A (en) * 2021-10-12 2021-11-12 华中科技大学 Method and system for measuring correlation between data sample and model decision boundary
CN114298122A (en) * 2021-10-22 2022-04-08 腾讯科技(深圳)有限公司 Data classification method, device, equipment, storage medium and computer program product
CN114048290A (en) * 2021-11-22 2022-02-15 鼎富智能科技有限公司 Text classification method and device
WO2023185539A1 (en) * 2022-03-28 2023-10-05 华为技术有限公司 Machine learning model training method, service data processing method, apparatuses, and systems
CN115018552A (en) * 2022-06-28 2022-09-06 中国科学技术大学 Method for determining click rate of product
CN115186764A (en) * 2022-08-03 2022-10-14 腾讯科技(北京)有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘金岭 ; .数据挖掘技术在商品销售预测方面的应用.商场现代化.2008,(第05期),全文. *

Also Published As

Publication number Publication date
CN117349344A (en) 2024-01-05

Similar Documents

Publication Publication Date Title
Bi et al. A big data clustering algorithm for mitigating the risk of customer churn
Yang et al. Social media data analytics for business decision making system to competitive analysis
US11574202B1 (en) Data mining technique with distributed novelty search
US11068789B2 (en) Dynamic model data facility and automated operational model building and usage
CN110147882B (en) Neural network model training method, crowd diffusion method, device and equipment
Verdhan Supervised learning with python
Liu et al. An efficient smart data mining framework based cloud internet of things for developing artificial intelligence of marketing information analysis
CN115983900A (en) Method, apparatus, device, medium, and program product for constructing user marketing strategy
CN117349344B (en) Intelligent product sales data acquisition method and system based on big data
Simion-Constantinescu et al. Deep neural pipeline for churn prediction
Saini et al. Customer Segmentation using K-Means Clustering
CN113779116A (en) Object sorting method, related equipment and medium
Shi et al. Customer Churn Analysis for Live Stream E-Commerce Platforms by Using Decision Tree Method
Verdhan et al. Introduction to supervised learning
CN113505369A (en) Method and device for training user risk recognition model based on space-time perception
CN112884028A (en) System resource adjusting method, device and equipment
Khansong et al. Customer Service Improvement based on Electricity Payment Behaviors Analysis using Data Mining Approaches
Siregar et al. Classification data for direct marketing using deep learning
Yun Performance evaluation of intelligent prediction models on the popularity of motion pictures
Russo Adaptive product classification for inventory optimization in multi-echelon networks
El Abbass Implementing a Bank Sales Analytics Solution and a Predictive model for the Next Best Offer
Bakkevig et al. Non-contractual churn prediction using Hierarchical Temporal Memory
Barsotti et al. A Decade of Churn Prediction Techniques in the TelCo Domain: A Survey
Sagming Using topological data analysis and Machine Learning to predict customer churn
Sathiya et al. Analysis of Clustering Effects In Cloud Workload Forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant