WO2023155426A1 - Data processing method and apparatus - Google Patents

Data processing method and apparatus Download PDF

Info

Publication number
WO2023155426A1
WO2023155426A1 PCT/CN2022/118700 CN2022118700W WO2023155426A1 WO 2023155426 A1 WO2023155426 A1 WO 2023155426A1 CN 2022118700 W CN2022118700 W CN 2022118700W WO 2023155426 A1 WO2023155426 A1 WO 2023155426A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
series data
data
series
time series
Prior art date
Application number
PCT/CN2022/118700
Other languages
French (fr)
Chinese (zh)
Inventor
陈家禹
陈浪
庄晓天
吴盛楠
Original Assignee
北京京东振世信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东振世信息技术有限公司 filed Critical 北京京东振世信息技术有限公司
Publication of WO2023155426A1 publication Critical patent/WO2023155426A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

Disclosed in the present disclosure are a data processing method and apparatus. A specific embodiment comprises: determining time series data to be processed; for a case where said time series data has a quantity abrupt change point at a set time point in a historical target time period, segmenting, from said time series data, first time series data before the set time point and second time series data after the set time point; determining a target feature time series cluster to which the second time series data belongs, and using a preset first data processing model to process the first time series data and a second data processing model matching the target feature time series cluster to process the second time series data; and according to the first processing result of the first data processing model and the processing result of the second data processing model, determining a first prediction result of article demands in a prediction time series.

Description

数据处理的方法和装置Method and device for data processing
相关申请的交叉引用Cross References to Related Applications
本申请要求享有2022年2月17日提交的名称为“一种数据处理的方法和装置”的中国专利申请No.202210144038.0的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分或全部。This application claims the priority of the Chinese patent application No. 202210144038.0 filed on February 17, 2022, entitled "A Method and Device for Data Processing", and the content disclosed in the above-mentioned Chinese patent application is hereby cited in its entirety as this application part or all of .
技术领域technical field
本公开涉及智慧供应链技术领域,尤其涉及一种数据处理的方法和装置。The present disclosure relates to the technical field of smart supply chain, and in particular to a data processing method and device.
背景技术Background technique
在零售行业中,物品需求情况的预测是供应链管理的基础,它起到了为库存补调及仿真提供输入数据的作用,其效果的好坏直接决定着后续模型结果的有效性。同时,预测数据也可以向供应链管理人员展示预测数据的预计情况,从而起到决策支持的作用。In the retail industry, the forecast of item demand is the basis of supply chain management. It plays the role of providing input data for inventory replenishment and simulation. The quality of its effect directly determines the validity of subsequent model results. At the same time, the forecast data can also show the expected situation of the forecast data to the supply chain managers, so as to play the role of decision support.
在现有零售行业中,常使用单一的FFORMA模型(Feature-based forecast model averaging)作为全量时序数据处理的数据处理模型对物品需求情况进行预测,但是单一的数据处理模型对于多种不同类型的时序数据无法进行区分,导致预测结果的准确性低。In the existing retail industry, a single FFORMA model (Feature-based forecast model averaging) is often used as a data processing model for full time-series data processing to predict the demand for items, but a single data processing model is not suitable for many different types of time-series Data cannot be differentiated, resulting in low accuracy of forecast results.
发明内容Contents of the invention
有鉴于此,本公开实施例提供一种数据处理的方法和装置,针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况进行分析,将待处理的时序数据分割成位于设定时间点之前的第一时序数据以及位于设定时间点之后的第二时序数据,其中,所述数量突变点通过大数据统计学确定出,使得第一时序数据所包括的多个时间点的数据小于数量突变点的数值,第二时序数据所包括的多个时间点的数据等于或大于数量突变点的数值,后续通过第 一数据处理模型和第二数据处理模型分别对第一时序数据和第二时序数据进行处理,即实现采用不同数据处理模型对不同阶段的不同数据进行处理,以使处理结果能够更真实反映不同阶段的物品需求情况,最后根据第一数据处理模型的第一处理结果以及第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于第一预测结果设置预测时序内的物品库存。In view of this, the embodiments of the present disclosure provide a data processing method and device, which analyzes the situation that the time series data to be processed has a sudden change in quantity at a set time point within the historical target time period, and the The time-series data to be processed is divided into first time-series data before the set time point and second time-series data after the set time point, wherein the number of sudden changes is determined by big data statistics, so that the first time-series The data at multiple time points included in the data is less than the value of the quantitative mutation point, and the data at multiple time points included in the second time series data is equal to or greater than the numerical value of the quantitative mutation point, followed by the first data processing model and the second data The processing model processes the first time-series data and the second time-series data respectively, that is, different data processing models are used to process different data at different stages, so that the processing results can more truly reflect the demand for items at different stages, and finally according to the first A first processing result of the data processing model and a processing result of the second data processing model determine the first forecast result of the item demand in the forecast time series, so as to set the item inventory in the forecast time series based on the first forecast result.
为实现上述目的,根据本公开实施例的第一方面,提供了一种数据处理的方法。To achieve the above purpose, according to the first aspect of the embodiments of the present disclosure, a data processing method is provided.
本公开实施例的数据处理的方法包括:The data processing method of the embodiment of the present disclosure includes:
确定待处理的时序数据;其中,所述待处理的时序数据指示在历史目标时间段内物品需求量变化情况;Determining the time series data to be processed; wherein, the time series data to be processed indicates changes in demand for items within a historical target time period;
针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,将所述待处理的时序数据分割出位于所述设定时间点之前的第一时序数据以及位于所述设定时间点之后的第二时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;其中,所述数量突变点通过大数据统计学确定出;For the situation that the time series data to be processed has a sudden change in quantity at a set time point within the historical target time period, the time series data to be processed is divided into the first time point before the set time point. Time-series data and second time-series data located after the set time point; and determining the target characteristic time-series cluster to which the second time-series data belongs, using a preset first data processing model to process the first time-series data and using A second data processing model that matches the target characteristic time series cluster processes the second time series data; wherein, the number of mutation points is determined through big data statistics;
根据所述第一数据处理模型的第一处理结果以及所述第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first prediction result of the item demand situation in the forecast time series, so as to set the Forecasting inventory of items within a time series.
在本申请的一个或多个实施例中,在所述确定待处理的时序数据之后,进一步包括:In one or more embodiments of the present application, after the determination of the time series data to be processed, it further includes:
针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点不存在数量突变点的情况,In view of the fact that the time series data to be processed does not have a quantity mutation point at a set time point within the historical target time period,
利用预设的所述第一数据处理模型处理所述待处理的时序数据;Processing the time series data to be processed by using the preset first data processing model;
根据所述第一数据处理模型的第二处理结果,确定预测时序内的 物品需求情况的第二预测结果,以基于所述第二预测结果设置所述预测时序内的物品库存。According to the second processing result of the first data processing model, determine a second forecast result of the item demand situation in the forecast time series, so as to set the item inventory in the forecast time series based on the second forecast result.
在本申请的一个或多个实施例中,所述第一数据处理模型的第一处理结果包括:所述预测时序内设定时间点之前的物品需求情况的预测结果;所述第二处理模型的处理结果包括:所述预测时序内设定时间点之后的物品需求情况的预测结果;In one or more embodiments of the present application, the first processing result of the first data processing model includes: the forecast result of the item demand situation before the set time point in the forecast time series; the second processing model The processing result includes: the forecast result of the item demand situation after the set time point in the forecast time series;
所述确定预测时序内的物品需求情况的第一预测结果,包括:将所述预测时序内设定时间点之前的物品需求情况的预测结果与所述预测时序内设定时间点之后的物品需求情况的预测结果拼接,得到所述第一预测结果。The determination of the first forecast result of the item demand situation in the forecast time series includes: combining the forecast result of the item demand situation before the set time point in the forecast time series with the item demand after the set time point in the forecast time series The predicted results of the situations are concatenated to obtain the first predicted result.
在本申请的一个或多个实施例中,在所述确定待处理的时序数据之后,进一步包括:In one or more embodiments of the present application, after the determination of the time series data to be processed, it further includes:
在分析所述待处理的时序数据满足正态性假设的情况下,利用Buishand统计方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点;In the case that the time series data to be processed satisfies the assumption of normality, the Buishand statistical method is used to determine that the time series data to be processed has a quantitative mutation point at a set time point within the historical target time period or does not exist Quantitative mutation point;
在分析所述待处理的时序数据不满足正态性假设的情况下,利用Pettitt检验方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点。In the case that the time-series data to be processed does not satisfy the assumption of normality, the Pettitt test method is used to determine that the time-series data to be processed has a quantitative mutation point at a set time point within the historical target time period or does not There are quantitative mutation points.
在本申请的一个或多个实施例中,所述方法进一步包括:从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据;从所述目标物品时序数据中分割出位于所述设定时间点之后的训练用第二时序数据训练用第二时序数据,其中,所述训练用第二时序数据所包括的多个时间点的数据等于或大于所述目标物品时序数据包括的数量突变点的数值;In one or more embodiments of the present application, the method further includes: from all the item time series data of the test case, screening the target item time series data with a number of mutation points; segmenting the target item time series data from the The second time-series data for training after the set time point, wherein the data of multiple time points included in the second time-series data for training is equal to or greater than the time-series data of the target item the number of mutation points to include;
对所述训练用第二时序数据进行聚类,以得到一个或多个特征时序簇;clustering the second time-series data for training to obtain one or more characteristic time-series clusters;
所述确定所述第二时序数据所属的目标特征时序簇,包括:从一 个或多个所述特征时序簇中,为所述第二时序数据选择出其所属目标特征时序簇。The determining the target characteristic time-series cluster to which the second time-series data belongs includes: selecting a target characteristic time-series cluster for the second time-series data from one or more of the characteristic time-series clusters.
在本申请的一个或多个实施例中,所述为所述第二时序数据选择出其所属目标特征时序簇,包括:针对所述特征时序簇有多个的情况,将所述第二时序数据与各个所述特征时序簇进行匹配;根据匹配的结果,从多个所述特征时序簇筛选出目标特征时序簇。In one or more embodiments of the present application, the selecting the target characteristic time series cluster for the second time series data includes: for the case where there are multiple characteristic time series clusters, the second time series The data is matched with each of the characteristic time-series clusters; according to the matching result, a target characteristic time-series cluster is screened out from the plurality of characteristic time-series clusters.
在本申请的一个或多个实施例中,所述方法进一步包括:从所述目标物品时序数据中分割出位于所述设定时间点之前的训练用第一时序数据,其中,所述训练用第一时序数据包括的多个时间点的数据小于所述目标物品时序数据包括的数量突变点的数值;对所述测试用例中除所述目标物品时序数据之外的其他物品时序数据与所述训练用第一时序数据进行聚类;利用聚类的结果训练预设的第一类待训练模型,得到第一数据处理模型,其中,所述第一类待训练模型基于一个或多个基模型配置得到,所述基模型包括ARIMA、ETS、Croston、简单移动平均、FBProphet、Holt-Winters、一阶指数平滑中的一种或多种。In one or more embodiments of the present application, the method further includes: segmenting the first time-series data for training before the set time point from the time-series data of the target item, wherein the training The data of multiple time points included in the first time series data is smaller than the value of the quantity mutation point included in the time series data of the target item; for the time series data of other items in the test case except the time series data of the target item and the described The training uses the first time-series data to perform clustering; using the results of the clustering to train a preset first type of model to be trained to obtain a first data processing model, wherein the first type of model to be trained is based on one or more base models The configuration is obtained, and the base model includes one or more of ARIMA, ETS, Croston, simple moving average, FBProphet, Holt-Winters, and first-order exponential smoothing.
在本申请的一个或多个实施例中,所述方法还包括:针对每一种特征时序簇,利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型,得到与所述特征时序簇相匹配的第二数据处理模型,其中,所述第二类待训练模型基于一个或多个基模型配置得到,所述第二类待训练模型与所述第一类待训练模型不同。In one or more embodiments of the present application, the method further includes: for each characteristic time-series cluster, using the time-series data corresponding to the characteristic time-series cluster to train a preset second type of model to be trained to obtain the A second data processing model that matches the characteristic time series clusters, wherein the second type of model to be trained is configured based on one or more base models, and the second type of model to be trained is the same as the first type of model to be trained different.
在本申请的一个或多个实施例中,所述从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据,包括:针对所述测试用例中的每一种仓库物品标识的物品时序数据,执行如下操作:确定数据分割点;在所述数据分割点为所述仓库物品标识的物品时序数据的数量突变点的情况,确定所述仓库物品标识的物品时序数据为所述目标物品时序数据。In one or more embodiments of the present application, the screening out the target item time series data with quantity mutation points from all the item time series data of the test case includes: for each warehouse item in the test case For the time series data of the identified items, perform the following operations: determine the data segmentation point; in the case where the data division point is a sudden change in the quantity of the item time series data identified by the warehouse item, determine that the item time series data identified by the warehouse item is the Describe the timing data of the target item.
在本申请的一个或多个实施例中,所述对所述训练用第二时序数据进行聚类,包括:为预设的距离函数确定序列平移函数和平移距离;利用确定出所述序列平移函数和所述平移距离的距离函数和DBSCAN算法对所述训练用第二时序数据进行聚类。In one or more embodiments of the present application, the clustering of the second time series data for training includes: determining a sequence translation function and a translation distance for a preset distance function; using the determined sequence translation The function and the distance function of the translation distance and the DBSCAN algorithm perform clustering on the second time series data for training.
在本申请的一个或多个实施例中,所述为所述第二时序数据选择出其所属目标特征时序簇,包括:对所述第二时序数据进行首尾拼接;将拼接后的结果分别与所述一个或多个特征时序簇进行匹配,将匹配值最高的特征时序簇作为所述目标特征时序簇。In one or more embodiments of the present application, the selecting the target characteristic time-series cluster for the second time-series data includes: performing end-to-end splicing on the second time-series data; The one or more characteristic timing clusters are matched, and the characteristic timing cluster with the highest matching value is used as the target characteristic timing cluster.
在本申请的一个或多个实施例中,所述利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型,包括:对所述特征时序簇包括的训练用第二时序数据进行剪裁,并对剪裁的结果进行拼接,生成新的训练用第二时序数据;利用所述特征时序簇包括的训练用第二时序数据和所述新的训练用第二时序数据,训练预设的第二类待训练模型。In one or more embodiments of the present application, using the time series data corresponding to the characteristic time series cluster to train the preset second type of model to be trained includes: using the second time series for training included in the characteristic time series cluster Clipping the data, and splicing the clipped results to generate new training second time series data; using the training second time series data included in the feature time series cluster and the new training second time series data, training pre-training The second type of model to be trained is set.
为实现上述目的,根据本公开一个或多个实施例的第二方面,提供了一种数据处理的装置。To achieve the above object, according to a second aspect of one or more embodiments of the present disclosure, a data processing apparatus is provided.
本公开实施例的数据处理的装置包括:The data processing device of the embodiment of the present disclosure includes:
确定模块,用于确定待处理的时序数据,其中,所述待处理的时序数据指示历史目标时间段内物品需求量变化情况;A determining module, configured to determine the time series data to be processed, wherein the time series data to be processed indicates the change in demand for items within the historical target time period;
预测模块,用于针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,根据所述数量突变点,将所述待处理的时序数据分割出第二时序数据以及第一时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;其中,所述数量突变 点通过大数据统计学确定出;A forecasting module, configured to divide the time-series data to be processed according to the quantity mutation point when the time-series data to be processed has a quantity mutation point at a set time point within the historical target time period output the second time series data and the first time series data; and determine the target characteristic time series cluster to which the second time series data belongs, process the first time series data with the preset first data processing model and use the time series with the target feature A second data processing model that matches the clusters processes the second time series data; wherein, the number of mutation points is determined through big data statistics;
根据所述第一数据处理模型的第一处理结果以及所述第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first prediction result of the item demand situation in the forecast time series, so as to set the Forecasting inventory of items within a time series.
为实现上述目的,根据本公开一个或多个实施例的第三方面,提供了一种数据处理的设备。To achieve the above object, according to a third aspect of one or more embodiments of the present disclosure, a data processing device is provided.
本公开一个或多个实施例的数据处理的设备包括:一个或多个处理器;存储系统,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开实施例的数据处理的方法。The data processing device in one or more embodiments of the present disclosure includes: one or more processors; a storage system for storing one or more programs; when the one or more programs are processed by the one or more processors, so that the one or more processors implement the data processing method of the embodiment of the present disclosure.
为实现上述目的,根据本公开一个或多个实施例的第四方面,提供了一种计算机可读介质。To achieve the above object, according to a fourth aspect of one or more embodiments of the present disclosure, a computer-readable medium is provided.
本公开一个或多个实施例的计算机可读介质上存储有计算机程序,所述程序被处理器执行时实现本公开实施例的数据处理方法。A computer program is stored on a computer-readable medium in one or more embodiments of the present disclosure, and when the program is executed by a processor, the data processing method of the embodiments of the present disclosure is implemented.
上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.
附图说明Description of drawings
附图用于更好地理解本公开,不构成对本公开的不当限定。其中:The accompanying drawings are for better understanding of the present disclosure, and do not constitute an improper limitation of the present disclosure. in:
图1是本公开一个或多个实施例中的一种数据处理方法的主要流程的示意图;FIG. 1 is a schematic diagram of the main flow of a data processing method in one or more embodiments of the present disclosure;
图2是本公开一个或多个实施例中的数据分割的示意图;FIG. 2 is a schematic diagram of data segmentation in one or more embodiments of the present disclosure;
图3是本公开一个或多个实施例中的针对待处理的时序数据在位于历史目标时间段内的设定时间点不存在数量突变点的情况的主要流 程的示意图;Fig. 3 is a schematic diagram of the main process for the case where the time series data to be processed does not have a quantity mutation point at a set time point within the historical target time period in one or more embodiments of the present disclosure;
图4是本公开一个或多个实施例中的确定数量突变点的主要流程的示意图;Fig. 4 is a schematic diagram of the main process of determining the number of mutation points in one or more embodiments of the present disclosure;
图5本公开一个或多个实施例中的得到特征时序簇的主要流程的示意图;FIG. 5 is a schematic diagram of the main process of obtaining characteristic timing clusters in one or more embodiments of the present disclosure;
图6本公开一个或多个实施例中的对所述训练用第二时序数据进行聚类的主要流程的示意图;FIG. 6 is a schematic diagram of the main process of clustering the second time series data for training in one or more embodiments of the present disclosure;
图7是本公开一个或多个实施例中的一种为所述第二时序数据选择出其所属目标特征时序簇的主要流程的示意图;Fig. 7 is a schematic diagram of a main process of selecting a target feature time series cluster for the second time series data in one or more embodiments of the present disclosure;
图8本公开一个或多个实施例中的确定第一数据处理模型的主要流程的示意图;Fig. 8 is a schematic diagram of the main process of determining the first data processing model in one or more embodiments of the present disclosure;
图9是本公开一个或多个实施例中的另一种为所述第二时序数据选择出其所属目标特征时序簇的主要流程的示意图;Fig. 9 is a schematic diagram of another main process for selecting a target feature time series cluster for the second time series data in one or more embodiments of the present disclosure;
图10是本公开一个或多个实施例中的数据剪裁步骤的示意图;Fig. 10 is a schematic diagram of a data pruning step in one or more embodiments of the present disclosure;
图11是本公开一个或多个实施例中的数据剪裁截取过程的示意图;Fig. 11 is a schematic diagram of a data clipping and intercepting process in one or more embodiments of the present disclosure;
图12是本公开一个或多个实施例中的一种时序数据预测销量的整体流程示意图;Fig. 12 is a schematic diagram of an overall flow chart of time series data forecasting sales in one or more embodiments of the present disclosure;
图13本公开一个或多个实施例中的一种数据处理装置的主要模块的示意图;FIG. 13 is a schematic diagram of main modules of a data processing device in one or more embodiments of the present disclosure;
图14本公开一个或多个实施例中可以应用于其中的示例性系统架构图;FIG. 14 is an exemplary system architecture diagram that can be applied in one or more embodiments of the present disclosure;
图15适于用来实现本公开一个或多个实施例中的终端设备或服务器的计算机系统的结构示意图。Fig. 15 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server in one or more embodiments of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清 楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
根据本公开一个或多个实施例的第一方面,提供一种应用于服务器的数据处理方法。According to a first aspect of one or more embodiments of the present disclosure, a data processing method applied to a server is provided.
在商品零售领域,常在某些时序中出现在特定时间段内物品需求量显著增加的现象,如果无法针对物品需求量的物品进行准确的,将会出现缺货、断货的风险,从而造成了服务水平的降低和成本的升高。对于不同物品的时序,通常以SKU(Stock Keeping Unit)作为最小库存单元来表示。现有的FFORMA架构在预测不同的仓库物品标识时会出现以下问题:In the field of commodity retailing, there is often a phenomenon that the demand for items increases significantly within a certain period of time in certain time series. If the demand for items cannot be accurately calculated, there will be a risk of out-of-stock or out-of-stock, resulting in Reduced service levels and increased costs. For the timing of different items, it is usually represented by SKU (Stock Keeping Unit) as the smallest stock keeping unit. The existing FFORMA architecture suffers from the following problems when predicting different warehouse item identities:
(1)物品需求量显著增加的现象常常只存在于部分仓库物品标识中。当模型使用全量仓库物品标识对模型进行训练时,没有物品需求量显著增加的现象的数据会拉低物品需求量显著增加的现象的数据预测值,使得预测值偏低;(1) The phenomenon of a significant increase in the demand for items often only exists in part of the warehouse item identification. When the model is trained with the full warehouse item identification, the data without a significant increase in the demand for items will lower the predicted value of the data for the phenomenon of a significant increase in the demand for items, making the predicted value low;
(2)物品需求量显著增加的现象在不同时序间显示的形态不同,在使用FFORMA时,无法针对不同时序的商品有区别地挖掘出符合物品需求量显著增加期间时序特点的相关特征;(2) The phenomenon of a significant increase in the demand for items shows different forms in different time series. When using FFORMA, it is impossible to differentiate the relevant features that meet the characteristics of the time series during the period when the demand for items increases significantly for different time series commodities;
(3)FFORMA模型的优化难度比较大,很难在短时间内训练并拟合出适合各类物品需求量显著增加的数据模型。(3) The optimization of the FFORMA model is relatively difficult, and it is difficult to train and fit a data model suitable for the significantly increased demand for various items in a short period of time.
图1是本公开一个或多个实施例的一种数据处理方法的主要流程的示意图。如图1所示,该方法主要包括:Fig. 1 is a schematic diagram of a main flow of a data processing method according to one or more embodiments of the present disclosure. As shown in Figure 1, the method mainly includes:
步骤S101:确定待处理的时序数据,其中,待处理的时序数据指示在历史目标时间段内物品需求量变化情况;Step S101: Determine the time-series data to be processed, wherein the time-series data to be processed indicates the change of demand for items within the historical target time period;
步骤S102:针对待处理的时序数据在位于历史目标时间段内的设定时间点存在数量突变点的情况,根据数量突变点,将待处理的时序数据分割出位于设定时间点之前的第一时序数据以及位于设定时间点之后的第二时序数据,数量突变点通过大数据统计学确定出;Step S102: In the case that the time-series data to be processed has a quantity mutation point at the set time point within the historical target time period, according to the quantity mutation point, the time-series data to be processed is divided into the first one before the set time point. For the time series data and the second time series data located after the set time point, the quantitative mutation point is determined through big data statistics;
步骤S103:确定第二时序数据所属的目标特征时序簇,利用预设 的第一数据处理模型处理第一时序数据以及利用与目标特征时序簇相匹配的第二数据处理模型处理第二时序数据;Step S103: Determine the target characteristic time-series cluster to which the second time-series data belongs, process the first time-series data with a preset first data processing model and process the second time-series data with a second data processing model matching the target characteristic time-series cluster;
步骤S104:根据第一数据处理模型的第一处理结果以及第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于第一预测结果设置预测时序内的物品库存。Step S104: According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first forecast result of the item demand situation in the forecast time series, so as to set the items in the forecast time series based on the first forecast result in stock.
其中,设定时间点可以有两种设定方式。一种设定方式可以为指定一个时间点为设定时间点,例如每个月的20日,也可以是通过对历史时间段内物品需求量变化情况进行大数据统计学分析确定出的数量突变点的时间点为设定时间点,例如通过物品需求量变化情况确定出存在数量突变点的时间点为每个月的10日则确定10日为该设定时间点。值得说明的是,该设定时间点可以根据实际情况进行相应地设定或调整。以使设定时间点可以动态变化,满足不同需求。Wherein, there are two setting methods for setting the time point. One setting method can be to designate a time point as the set time point, such as the 20th of each month, or it can be a sudden change in quantity determined by statistical analysis of big data on changes in the demand for items in the historical time period The time point of the point is the set time point, for example, if it is determined that the time point of the quantity mutation point is the 10th of each month through the change of the demand for the item, then the 10th is determined to be the set time point. It is worth noting that the set time point can be set or adjusted accordingly according to actual conditions. So that the set time point can be dynamically changed to meet different needs.
其中,数量突变点通过大数据统计学确定出,具体是指通过现有的统计方式比如多种均一性统计方法(Buishand统计方法)、突变点检验方法(Pettitt检验方法)等统计大量的历史时间段内物品需求量变化情况,根据统计的结果,确定出数量突变点。其中,针对统计的结果为多个数量突变点对应多个时间点情况,可选择数值最大的数量突变点为最终的数量突变点,并确定该最终的数量突变点对应的时间点为设定时间点。另外,针对统计的结果为多个数量突变点对应多个时间点情况,也可以选择出现次数最多的时间点位设定时间点,并通过计算出现次数最多的时间点的数量突变点的数值,确定数量突变点的数值或者数量突变点的突变范围等。Among them, the number of mutation points is determined by big data statistics, which specifically refers to the statistical methods such as various uniformity statistical methods (Buishand statistical method), mutation point test method (Pettitt test method), etc. to count a large amount of historical time Changes in the demand for items within a segment, and determine the quantity mutation point according to the statistical results. Among them, in view of the statistical result that multiple quantity mutation points correspond to multiple time points, the quantity mutation point with the largest value can be selected as the final quantity mutation point, and the time point corresponding to the final quantity mutation point is determined as the set time point. In addition, in view of the statistical result that multiple quantity mutation points correspond to multiple time points, you can also select the time point with the largest number of occurrences to set the time point, and by calculating the value of the number mutation point at the time point with the largest number of occurrences, Determine the value of the quantity mutation point or the mutation range of the quantity mutation point.
在本公开实施例中,第二时序数据为在待处理的时序数据中物品需求量突然增加的时序部分,第一时序数据为在待处理的时序数据中除去第二时序数据的时序部分,一般来说,第一时序数据所包括的多个时间点的数值小于数量突变点的数值,第二时序数据所包括的多个时间点的数值等于或大于数量突变点的数值。其中,数据分割的示意图如图2所示。本申请主要是针对月末销量增长迅速的情况对后期的销量进行预测,因此,在本申请实施例中认为在数量突变点之后直至当月月末的时序数据均为第二时序数据。第一时序数据是指月初到数 量突变点所对应日期之前的物品销量数据。其中,第二时序数据包括的多个时间点的数据一般等于或者大于数量突变点的数值,第一时序数据包括的多个时间点的数据一般小于数量突变点的数值。值得说明的是,图2示出的分割线是数量突变点之前的一个时间点的销量点。即图2示出的分割线后的点位数量突变点。也就是说,数量突变点包含于第二时序数据中。例如,设定时间点为每个月的20日或者通过历史时间段内物品需求量变化确定出存在数量突变点的时间点位每月的20日,待处理时序数据为2021年1月1日至2021年2月28日的时序数据,假设在设定时间点处,待处理时序数据中存在数量突变点,该数量突变点的销量为30件/天,那么第二时序数据为2021年1月20日至2021年1月31日和2021年2月20日至2021年2月28日时间段内的销量数据,且该销量数据一般等于或大于30件/天,第一时序数据为2021年1月1日至2021年1月19日和2021年2月1日至2021年2月19日时间段内的销量数据,且该销量数据一般小于30件/天。In the embodiment of the present disclosure, the second time-series data is the time-series part in which the demand for items suddenly increases in the time-series data to be processed, and the first time-series data is the time-series part in which the second time-series data is removed from the time-series data to be processed. In other words, the values of the multiple time points included in the first time series data are smaller than the values of the quantitative change point, and the values of the multiple time points included in the second time series data are equal to or greater than the value of the quantitative change point. Wherein, a schematic diagram of data segmentation is shown in FIG. 2 . This application mainly predicts the sales volume in the later period for the rapid sales growth at the end of the month. Therefore, in the embodiment of this application, it is considered that the time series data after the sudden change in quantity until the end of the current month are the second time series data. The first time series data refers to the item sales data from the beginning of the month to the date corresponding to the quantity mutation point. Wherein, the data of multiple time points included in the second time-series data is generally equal to or greater than the value of the quantity mutation point, and the data of multiple time points included in the first time-series data is generally smaller than the value of the quantity mutation point. It is worth noting that the dividing line shown in Figure 2 is the sales point at a time point before the sudden change in quantity. That is, the abrupt change point of the number of points behind the dividing line shown in FIG. 2 . That is to say, the number of sudden change points is included in the second time series data. For example, set the time point as the 20th of each month, or determine the time point where there is a sudden change in quantity through changes in the demand for items in the historical time period. The time series data to be processed is January 1, 2021 For the time series data until February 28, 2021, assuming that at the set time point, there is a sudden change point in the quantity to be processed, and the sales volume at the sudden change point is 30 pieces/day, then the second time series data is 1 in 2021 Sales data during the period from February 20 to January 31, 2021 and from February 20, 2021 to February 28, 2021, and the sales data is generally equal to or greater than 30 pieces per day, and the first time series data is 2021 Sales data from January 1, 2021 to January 19, 2021 and from February 1, 2021 to February 19, 2021, and the sales data is generally less than 30 pieces per day.
值得说明的是,位于设定时间点之后的第二时序数据可包括有大于设定时间点的数量突变点的数值。It is worth noting that the second time-series data after the set time point may include a value greater than the number of sudden change points of the set time point.
其中,步骤S103中的第一数据处理模型是通过从测试用例全部的物品时序数据中分割出的对应于第一时序数据的训练用第一时序数据训练预设的第一类待训练模型得到的,第二数据处理模型通过从测试用例全部的物品时序数据中分割出的对应于第二时序数据的训练用第二时序数据训练预设的第二类待训练模型得到的。Wherein, the first data processing model in step S103 is obtained by training the preset first type of model to be trained by training the first time series data corresponding to the first time series data segmented from all the item time series data of the test case , the second data processing model is obtained by training a preset second type of model to be trained with the second time series data segmented from all item time series data of the test case and corresponding to the second time series data for training.
在一种可选的实施例中,待处理的时序数据所指示的时间段可以根据需要实际情况进行确定。例如,假设当前日期为2021年6月1日,那么可以采用2021年1月1日至2021年6月1日的时序数据作为待处理数据,来对2021年6月2日至2021年12月1日的数据进行预测,也可以采用2021年1月1日至2021年3月1日的时序数据作为待处理数据,来对2021年6月2日至2021年12月1日的数据进行预测。由于某些月份存在商家的优惠活动或大型节日,例如双十一,因此采 用相同时间段的时序数据进行预测准确率更高,即采用2020年11月1日至2020年11月30日的数据来对2021年11月1日至2021年11月30日的数据进行预测。In an optional embodiment, the time period indicated by the time series data to be processed may be determined according to actual needs. For example, assuming that the current date is June 1, 2021, then the time series data from January 1, 2021 to June 1, 2021 can be used as the data to be processed to process data from June 2, 2021 to December 2021 The data on the 1st can be used for forecasting, or the time series data from January 1, 2021 to March 1, 2021 can be used as the data to be processed to predict the data from June 2, 2021 to December 1, 2021 . Since there are promotional activities or large-scale festivals of merchants in certain months, such as Double Eleven, it is more accurate to use the time series data of the same time period for forecasting, that is, to use the data from November 1, 2020 to November 30, 2020 Let's make a forecast for the data from November 1, 2021 to November 30, 2021.
对于步骤S103,具体可以包括:将第一时序数据输入第一数据处理模型,并将第二时序数据输入第二数据处理模型,得到预测时序内的物品需求情况的预测结果。例如,针对预测销量预测的应用场景下,设定时间点为每个月的20日,待处理时序数据为2021年1月1日至2021年2月28日的时序数据,假设在设定时间点处,待处理时序数据中存在数量突变点,那么作为第二数据模型输入的第二时序数据为2021年1月20日至2021年1月31日和2021年2月20日至2021年2月28日,作为第一数据模型输入的第一时序数据为2021年1月1日至2021年1月19日和2021年2月1日至2021年2月19日。For step S103, it may specifically include: inputting the first time-series data into the first data processing model, and inputting the second time-series data into the second data processing model, so as to obtain the forecast result of the item demand in the forecast time series. For example, in the application scenario of forecasting sales forecast, the set time point is the 20th of each month, and the time series data to be processed is the time series data from January 1, 2021 to February 28, 2021. Assume that at the set time point, there is a quantitative mutation point in the time series data to be processed, then the second time series data input as the second data model is from January 20, 2021 to January 31, 2021 and from February 20, 2021 to 20212 On January 28, the first time series data input as the first data model are from January 1, 2021 to January 19, 2021 and from February 1, 2021 to February 19, 2021.
在一种可选的实施例中,在步骤S101之后,还存在待处理的时序数据在位于历史目标时间段内的设定时间点不存在数量突变点的情况,如图3所示,包括:In an optional embodiment, after step S101, there is also a situation that the time series data to be processed does not have a quantity mutation point at a set time point within the historical target time period, as shown in FIG. 3 , including:
步骤S301:利用预设的所述第一数据处理模型处理待处理的时序数据;Step S301: process the time series data to be processed by using the preset first data processing model;
步骤S302:根据第一数据处理模型的第二处理结果,确定预测时序内的物品需求情况的第二预测结果,以基于第二预测结果设置所述预测时序内的物品库存。Step S302: According to the second processing result of the first data processing model, determine a second forecast result of the item demand situation in the forecast time series, so as to set the item inventory in the forecast time series based on the second forecast result.
由于待处理的时序数据中不存在数量突变点,因此无需分别针对第二时序数据和第一时序数据分别进行处理,仅利用处理第一时序数据的第一数据处理模型处理即可。Since there is no quantitative change point in the time-series data to be processed, it is not necessary to process the second time-series data and the first time-series data separately, and only use the first data processing model for processing the first time-series data.
在进一步可选的实施例中,第一数据处理模型的第一处理结果包括:预测时序内设定时间点之前的物品需求情况的预测结果;第二处理模型的处理结果包括:预测时序内设定时间点之后的物品需求情况 的预测结果;In a further optional embodiment, the first processing result of the first data processing model includes: the forecast result of the item demand situation before the set time point in the forecast time series; the processing result of the second processing model includes: the internal setting of the forecast time series The forecast results of the demand for goods after a certain time point;
步骤S104中确定预测时序内的物品需求情况的第一预测结果,进一步包括:将预测时序内设定时间点之前的物品需求情况的预测结果与预测时序内设定时间点之后的物品需求情况的预测结果拼接,得到第一预测结果。In step S104, determining the first forecast result of the item demand situation in the forecast time series further includes: combining the forecast result of the item demand situation before the set time point in the forecast time series with the forecast result of the item demand situation after the set time point in the forecast time series The prediction results are spliced to obtain the first prediction result.
示例性地,针对预测销量预测的应用场景下,设定时间点为每个月的20日,待处理时序数据为2021年1月1日至2021年2月28日的时序数据,预测的时序数据为2022年1月1日至2022年2月28日。假设在设定时间点处,待处理时序数据中存在数量突变点,那么作为第二数据模型输入的第二时序数据为2021年1月20日至2021年1月31日和2021年2月20日至2021年2月28日,作为第一数据模型输入的第一时序数据为2021年1月1日至2021年1月19日和2021年2月1日至2021年2月19日。根据第一数据模型可以输出2022年1月1日至2022年1月19日和2022年2月1日至2022年2月19日预测时序内设定时间点之前的物品需求情况的预测结果,根据第二数据模型可以输出2022年1月20日至2022年1月31日和2022年2月20日至2022年2月28日预测时序内设定时间点之后的物品需求情况的预测结果,将上述预测结果进行拼接,即可得到2022年1月1日至2022年2月28日的第一预测结果。For example, in the application scenario of forecasting sales forecast, the set time point is the 20th of each month, the time series data to be processed is the time series data from January 1, 2021 to February 28, 2021, and the forecast time series The data is from January 1, 2022 to February 28, 2022. Assuming that at the set time point, there is a quantitative mutation point in the time series data to be processed, then the second time series data input as the second data model is from January 20, 2021 to January 31, 2021 and February 20, 2021 From January 1, 2021 to February 28, 2021, the first time series data input as the first data model are from January 1, 2021 to January 19, 2021 and from February 1, 2021 to February 19, 2021. According to the first data model, the forecast results of the demand for items before the set time point in the forecast time series from January 1, 2022 to January 19, 2022 and from February 1, 2022 to February 19, 2022 can be output, According to the second data model, the forecast results of the demand for items after the set time point in the forecast time series from January 20, 2022 to January 31, 2022 and from February 20, 2022 to February 28, 2022 can be output, By splicing the above prediction results, the first prediction result from January 1, 2022 to February 28, 2022 can be obtained.
在本实施例中,数量突变点为在该点之后的数据显著大于该点之前的数据的点。由于不同的数据参数适用的检验方法不同,因此根据不同的参数选择不同的检验方法可以提高检验的准确性,在本申请实施例中,为了检验在数量突变点之后的数据是否显著大于该点之前的数据,在一种可选的实施例中,如图4所示,包括:In this embodiment, the quantitative abrupt point is a point at which the data after this point is significantly larger than the data before this point. Since different data parameters are applicable to different inspection methods, selecting different inspection methods according to different parameters can improve the accuracy of the inspection. The data, in an optional embodiment, as shown in Figure 4, includes:
步骤S401:在分析待处理的时序数据满足正态性假设的情况下,利用Buishand统计方法确定待处理的时序数据在位于历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点;Step S401: When analyzing the time series data to be processed to meet the normality assumption, use the Buishand statistical method to determine whether there is a quantitative change point or no quantitative change point at the set time point within the historical target time period in the time series data to be processed ;
步骤S402:在分析所述待处理的时序数据不满足正态性假设的情 况下,利用Pettitt检验方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点。Step S402: When analyzing that the time-series data to be processed does not satisfy the assumption of normality, use the Pettitt test method to determine that the time-series data to be processed has a quantitative change at a set time point within the historical target time period point or there is no quantitative mutation point.
对于Buishand统计方法,具体步骤如下:For the Buishand statistical method, the specific steps are as follows:
将待处理的时序数据定义为序列X={x 1,x 2,…,x k,…,x n},其中x k为设定时间点k的物品需求量数据。若X满足正态性假设,则可将序列X中每一时间点的物品需求量数据表示为: The time series data to be processed is defined as a sequence X={x 1 , x 2 ,...,x k ,...,x n }, where x k is the item demand data at a set time point k. If X satisfies the normality assumption, the item demand data at each time point in sequence X can be expressed as:
Figure PCTCN2022118700-appb-000001
Figure PCTCN2022118700-appb-000001
其中,ε i表示在时间点为i时的参差,μ表示均值,k表示设定时间点,Δ表示变化值。如果x k为设定时间点k的数量突变点的数值,x i表示在时间点为i时的物品需求量数据,1……n表示历史目标时间段内的n个时间点,那么仅需检测变化值Δ>0。此时构造待处理的时序数据的突变点前求和值S m用于检验Δ>0,具体S m表示如下: Among them, εi represents the variance at the time point i, μ represents the mean value, k represents the set time point, and Δ represents the change value. If x k is the value of the quantity mutation point at the set time point k, x i represents the item demand data at time point i, and 1...n represents n time points in the historical target time period, then only Detect change value Δ>0. At this time, construct the summation value S m before the sudden change point of the time series data to be processed to test Δ>0, and the specific S m is expressed as follows:
Figure PCTCN2022118700-appb-000002
Figure PCTCN2022118700-appb-000002
其中,
Figure PCTCN2022118700-appb-000003
为待处理的时序数据X的均值,S 0=0。对于同质性数据(即待处理的时序数据X在时间范围内分布不发生显著变化),我们期望设定时间点k的统计量S k在0上下波动,此时待处理的时序数据X中没有系统性的偏离均值
Figure PCTCN2022118700-appb-000004
的特征。若Δ>0,我们将期望在数量突变点前大多数的S m<0,而数量突变点后的大多数S m>0。由于本公开实施例仅需检验在设定时间点k处的数量突变点,此时只考虑检验设定时间点k的统计量S k。如果我们无法确定具体的数量突变点,选取k=min mS m。为完成检验,我们将其重缩放,即构造统计量Q k如下:
in,
Figure PCTCN2022118700-appb-000003
is the mean value of the time series data X to be processed, S 0 =0. For homogeneous data (that is, the distribution of the time series data X to be processed does not change significantly in the time range), we expect the statistic S k at the set time point k to fluctuate around 0, and at this time in the time series data X to be processed no systematic deviation from the mean
Figure PCTCN2022118700-appb-000004
Characteristics. If Δ>0, we would expect the majority of S m <0 before the quantitative break point and most S m >0 after the quantitative break point. Since the embodiment of the present disclosure only needs to test the quantity sudden change point at the set time point k, only the statistical quantity S k at the set time point k is considered to be tested at this time. If we cannot determine the specific number of mutation points, choose k=min m S m . To complete the test, we rescale it, that is, construct the statistic Q k as follows:
Figure PCTCN2022118700-appb-000005
Figure PCTCN2022118700-appb-000005
其中,
Figure PCTCN2022118700-appb-000006
表示待处理的时序数据方差,n为待处理的时序数据的数据量。通过Q k对数量突变点前求和值S m进行检验,进而确定显著性值。
in,
Figure PCTCN2022118700-appb-000006
Indicates the variance of the time series data to be processed, and n is the data volume of the time series data to be processed. The summation value S m before the quantitative mutation point is tested by Q k to determine the significance value.
在实际应用过程中,如果设定时间点为月末的倒数第10天,那么取历史样本为等长的10天,即可根据历史样本在显著性为95%时所对应的Q值为1.22,在判断过程中,如果求得的统计量Q k>1.22,则认 为在设定时间点存在数量突变点。 In the actual application process, if the time point is set to be the 10th day from the end of the month, then the historical sample is taken as 10 days of equal length, and the Q value corresponding to the historical sample is 1.22 when the significance is 95%. In the judgment process, if the obtained statistic Q k >1.22, it is considered that there is a quantitative mutation point at the set time point.
对于Pettitt检验方法,具体步骤如下:For the Pettitt test method, the specific steps are as follows:
将待处理的时序数据定义为序列X={x 1,x 2,…,x k,…,x n},构造统计量U k为: The time series data to be processed is defined as sequence X={x 1 ,x 2 ,…,x k ,…,x n }, and the construction statistic U k is:
Figure PCTCN2022118700-appb-000007
Figure PCTCN2022118700-appb-000007
其中,U k表示数量突变点前符号函数求和值,k表示设定时间点,x j表示在时间点j时的物品需求量数据,sgn为符号函数。与方法1相似,若无法确定数量突变点,选取k=min mU m。由于只考虑在设定时间点物品需求量显著增加的情况,不考虑物品需求量显著减少的情况,我们期望U k>0显著成立。根据Pettitt提出的近似公式,计算单边检验的p值如下: Among them, U k represents the summation value of the sign function before the quantity mutation point, k represents the set time point, x j represents the item demand data at time point j, and sgn is the sign function. Similar to method 1, if the number of mutation points cannot be determined, select k=min m U m . Since we only consider the case of a significant increase in the demand for goods at the set time point, and do not consider the case of a significant decrease in the demand for goods, we expect U k > 0 to be significantly established. According to the approximate formula proposed by Pettitt, the p-value of one-sided test is calculated as follows:
Figure PCTCN2022118700-appb-000008
Figure PCTCN2022118700-appb-000008
其中,exp表示以自然常数e为底的指数函数,取置信度为95%,若p<0.05,则在设定时间点k处存在数量突变点。Among them, exp represents an exponential function based on the natural constant e, and the confidence level is 95%. If p<0.05, there is a quantitative mutation point at the set time point k.
在一种可选的实施例中,如图5所示,在执行本公开一个或多个实施例的方法之前,还包括测试过程,具体包括:In an optional embodiment, as shown in FIG. 5, before performing the method in one or more embodiments of the present disclosure, a testing process is also included, specifically including:
步骤S501:从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据;Step S501: From all the item time series data of the test case, filter out the target item time series data with quantitative mutation points;
步骤S502:从目标物品时序数据中分割出位于设定时间点之后的训练用第二时序数据训练用第二时序数据,其中,训练用第二时序数据所包括的多个时间点的数据等于或大于目标物品时序数据包括的数量突变点的数值;Step S502: Segment the second time-series data for training after the set time point from the time-series data of the target item, wherein the data of multiple time points included in the second time-series data for training is equal to or A value greater than the quantity mutation point included in the time series data of the target item;
步骤S503:对训练用第二时序数据进行聚类,以得到一个或多个特征时序簇。Step S503: Clustering the second time series data for training to obtain one or more feature time series clusters.
通过上述步骤S501至S503,步骤S103中确定所述第二时序数据所属的目标特征时序簇,包括:从一个或多个特征时序簇中,为第二时序数据选择出其所属目标特征时序簇。Through the above steps S501 to S503, determining the target characteristic time-series cluster to which the second time-series data belongs in step S103 includes: selecting the target characteristic time-series cluster for the second time-series data from one or more characteristic time-series clusters.
由于现有技术中根据全部的物品时序数据确定FFORMA模型无法达到准确的预测效果,因此在本申请实施例中需要将具有数量突变点的目标物品时序数据进行筛选,进而将目标物品时序数据中的训练用第二时序数据作为模型输入,可以显著提高预测的准确率。不仅如此,在本申请实施例中,还将训练用第二时序数据进行聚类得到不同的特征时序簇,并将特征时序簇作为模型的输入,可以对具有数量突变点的目标物品时序数据进行进一步的划分,以特征时序簇进行的划分可以将不同特征的时序簇分类输入模型,使得预测更加准确。Since the FFORMA model in the prior art based on all item time series data cannot achieve accurate prediction results, in the embodiment of this application, it is necessary to screen the target item time series data with quantitative mutation points, and then filter the target item time series data Training uses the second time series data as model input, which can significantly improve the accuracy of prediction. Not only that, in the embodiment of the present application, the second time series data for training will be clustered to obtain different feature time series clusters, and the feature time series clusters will be used as the input of the model, and the time series data of target items with quantitative mutation points can be processed For further division, the division by feature time series clusters can classify time series clusters with different features into the model, making the prediction more accurate.
示例性的,测试用例中全部的物品时序数据共有100个,其中50个时序数据中存在数量突变点,那么该50个时序数据即为目标物品时序数据。假设每个时序数据仅包含一个月的商品销量,那么对于目标物品时序数据中的每一个时序数据,都可以通过数量突变点分隔出训练用第二时序数据,即可以分割出50个训练用第二时序数据,再通过对50个训练用第二时序数据进行聚类,得到特征时序簇。Exemplarily, there are 100 item time series data in the test case, among which 50 time series data have quantitative mutation points, then the 50 time series data are target item time series data. Assuming that each time-series data contains only one month’s commodity sales, then for each time-series data in the time-series data of the target item, the second time-series data for training can be separated by the number of mutation points, that is, 50 first time-series data for training can be divided. The second time-series data, and then by clustering the 50 second time-series data for training, the characteristic time-series clusters are obtained.
其中,对于步骤S501,在一种可选的实施例中,具体包括:针对测试用例中的每一种仓库物品标识的物品时序数据,执行:Wherein, for step S501, in an optional embodiment, it specifically includes: for the time series data of items identified by each warehouse item in the test case, execute:
步骤一:确定数据分割点;Step 1: Determine the data split point;
步骤二:判断数据分割点在仓库物品标识的物品时序数据中是否为数量突变点;若是,则确定仓库物品标识的物品时序数据为目标物品时序数据。Step 2: Determine whether the data segmentation point is a quantity mutation point in the item time-series data of the warehouse item identification; if so, determine that the item time-series data of the warehouse item identification is the target item time-series data.
此处确定是否为数量突变点的方法与前述步骤S301至步骤S303相同,其中,数据分割点可以通过人为经验获得,也可以通过数据透视的方法得到。Here, the method for determining whether it is a quantitative mutation point is the same as the aforementioned step S301 to step S303, wherein the data segmentation point can be obtained through human experience, or through data pivoting.
对于步骤S503中的聚类过程,在一种可选的实施例中,如图6所示,包括:For the clustering process in step S503, in an optional embodiment, as shown in Figure 6, including:
步骤S601:为预设的距离函数确定序列平移函数和平移距离;Step S601: determining a sequence translation function and a translation distance for a preset distance function;
步骤S602:利用确定出序列平移函数和平移距离的距离函数和DBSCAN算法对训练用第二时序数据进行聚类。Step S602: Clustering the second time series data for training by using the distance function and the DBSCAN algorithm that determine the sequence translation function and translation distance.
在本申请实施例中,聚类的目的在于将物品需求量变化情况的图形按照相似程度进行划分。在时序聚类中,由于时序存在着序列长度不等、移位相似和数据尺度等问题,现有技术中采用的欧式距离很难用于描述时序间的相似性。John Paparrizos在2015年提出了一种基于形状相似度量的时序聚类算法K-Shape,该算法可以较好地刻画时续间的相似距离,并使用类似K-means的思路对时序进行聚类。但是,实验发现此方法存在着计算量较大的问题,更重要的是,由于我们很难有关于时序充分的先验知识,类别量k值很难选取,常常需要通过反复的实验才能确定。因此,在一种可选的实施例中,本公开采用基于Shape Based Distance的DBSCAN算法进行聚类。In the embodiment of the present application, the purpose of clustering is to divide the graphs of changes in demand for items according to similarity. In time series clustering, due to the problems of unequal sequence length, similar shift and data scale in time series, the Euclidean distance used in the prior art is difficult to describe the similarity between time series. John Paparrizos proposed a time series clustering algorithm K-Shape based on shape similarity measure in 2015. This algorithm can better describe the similar distance between time-sequences, and use the idea similar to K-means to cluster time series. However, experiments have found that this method has the problem of a large amount of calculation. More importantly, because it is difficult for us to have sufficient prior knowledge about the time series, the value of the category quantity k is difficult to choose, and it often needs to be determined through repeated experiments. Therefore, in an optional embodiment, the present disclosure adopts the DBSCAN algorithm based on Shape Based Distance for clustering.
具体地,确定序列平移函数X (s)如下: Specifically, determine the sequence translation function X (s) as follows:
Figure PCTCN2022118700-appb-000009
Figure PCTCN2022118700-appb-000009
其中,s表示平移量,x 1,…,x n表示待平移时序X中时间点为1至n处的物品需求量。基于一个平移量s,可以得到两个时序X和Y之间的交叉相关性CC S(X,Y): Among them, s represents the amount of translation, and x 1 ,...,x n represent the demand for items at time points 1 to n in the sequence X to be translated. Based on a translation s, the cross-correlation CC S (X,Y) between two time series X and Y can be obtained:
Figure PCTCN2022118700-appb-000010
Figure PCTCN2022118700-appb-000010
其中,y i表示待平移时序Y中时间点为i处的物品需求量,y s+i表示待平移时序Y中时间点为s+i处的物品需求量,ω为常数,在CC s(X,Y)取最大值得情况下,认为平移量s处于最优位置。此时将交叉相关性CC s(X,Y)代入式(8)得到归一化量NCC(X,Y)。 Among them, y i represents the demand for goods at the time point i in the time series Y to be translated, y s+i represents the demand for goods at the time point s+i in the time series Y to be translated, ω is a constant, and in CC s ( X, Y) take the maximum value, it is considered that the translation amount s is in the optimal position. At this time, the cross-correlation CC s (X, Y) is substituted into formula (8) to obtain the normalized quantity NCC (X, Y).
Figure PCTCN2022118700-appb-000011
Figure PCTCN2022118700-appb-000011
其中,‖X‖表示时序X的范数,‖Y‖示时序Y的范数。将得到的归一 化量NCC(X,Y)代入下式(9),得到平移距离SBD。Wherein, ‖X‖ indicates the norm of time series X, and ‖Y‖ indicates the norm of time series Y. Substituting the obtained normalization amount NCC(X,Y) into the following formula (9), the translational distance SBD is obtained.
SBD(X,Y)=1-NCC(X,Y)     式(9)SBD(X,Y)=1-NCC(X,Y) Formula (9)
进一步地,在DBSCAN算法中涉及到的两个超参分别为:邻域半径ε和邻域内的样本个数阈值MinPts,可以使用elbow point方法和轮廓系数(silhouette score)确定。Further, the two hyperparameters involved in the DBSCAN algorithm are: the neighborhood radius ε and the threshold MinPts of the number of samples in the neighborhood, which can be determined using the elbow point method and the silhouette score.
通过上述对于距离函数的定义以及距离函数中序列平移函数和平移距离的确定,可以根据时序数据图形变化的相似度对训练用第二时序数据进行更加准确的聚类,同时解决了图形变化相似但由于时间点平移而导致的误差问题。Through the above definition of the distance function and the determination of the sequence translation function and translation distance in the distance function, the second time series data for training can be clustered more accurately according to the similarity of the time series data graphic changes, and at the same time solve the problem of similar graphic changes but Error issues due to shifting of time points.
在一种可选的实施例中,在针对特征时序簇有多个的情况时,如图7所示,为所述第二时序数据选择出其所属目标特征时序簇的步骤,可以进一步包括:In an optional embodiment, when there are multiple characteristic time-series clusters, as shown in FIG. 7 , the step of selecting the target characteristic time-series cluster for the second time-series data may further include:
步骤S701:将第二时序数据与各个特征时序簇进行匹配;Step S701: matching the second time series data with each characteristic time series cluster;
步骤S702:根据匹配的结果,从多个特征时序簇筛选出目标特征时序簇。Step S702: According to the matching result, a target feature time-series cluster is filtered out from multiple feature time-series clusters.
其中,对于匹配结果,可以根据实际应用场景和需求进行设定。在一种可选的实施例中,选取匹配度最高的特征时序簇作为目标特征时序簇。进一步地,对于匹配度,在本申请实施例中可以表示为第二时序数据与各个特征时序簇的相似程度,相似程度越高,匹配度越高。例如,聚类后的特征时序簇共有3个,分别为时序簇1、时序簇2和时序簇3,第二时序数据和3个时序簇的匹配度分别为0.9、0.6和0.1,那么时序簇1则为目标特征时序簇,当出现匹配度相同的情况时,可以择一进行随机选取。Wherein, the matching result may be set according to actual application scenarios and requirements. In an optional embodiment, the characteristic timing cluster with the highest matching degree is selected as the target characteristic timing cluster. Further, for the matching degree, in the embodiment of the present application, it may be expressed as a degree of similarity between the second time series data and each characteristic time series cluster, and the higher the degree of similarity, the higher the degree of matching. For example, there are three characteristic time series clusters after clustering, namely time series cluster 1, time series cluster 2 and time series cluster 3, and the matching degrees of the second time series data and the three time series clusters are 0.9, 0.6 and 0.1 respectively, then the time series cluster 1 is the time-series cluster of target features. When the matching degree is the same, one can be randomly selected.
在本申请实施例中,不仅可以从目标物品时序数据中分割出训练用第二时序数据,在一种可选的实施例中,如图8所示,还包括:In the embodiment of the present application, not only can the second time-series data for training be segmented from the time-series data of the target item, in an optional embodiment, as shown in FIG. 8 , it also includes:
步骤S801:从目标物品时序数据中分割出位于设定时间点之前的训练用第一时序数据,其中,训练用第一时序数据包括的多个时间点的数据小于目标物品时序数据包括的数量突变点的数值;Step S801: Segment the first time-series data for training before the set time point from the time-series data of the target item, wherein the data of multiple time points included in the first time-series data for training is smaller than the number of mutations included in the time-series data of the target item the value of the point;
步骤S802:对测试用例中除目标物品时序数据之外的其他物品时序数据与训练用第一时序数据进行聚类;Step S802: Clustering the time-series data of items other than the time-series data of the target item in the test case and the first time-series data for training;
步骤S803:利用聚类的结果训练预设的第一类待训练模型,得到第一数据处理模型,其中,第一类待训练模型基于一个或多个基模型配置得到,基模型包括ARIMA、ETS、Croston、简单移动平均、FBProphet、Holt-Winters、一阶指数平滑中的一种或多种。Step S803: Use the clustering results to train the preset first type of model to be trained to obtain the first data processing model, wherein the first type of model to be trained is obtained based on one or more base model configurations, and the base models include ARIMA, ETS , Croston, simple moving average, FBProphet, Holt-Winters, one or more of first-order exponential smoothing.
示例性的,针对预测销量预测的应用场景下,测试用例中全部的物品时序数据共有10个,其中目标物品时序数据有2个,即具有数量突变点的时序数据。设定数量突变点对应的时间点为每个月的20日,目标物品时序数据为2021年2月1日至2021年2月28日和2020年3月1日至2020年3月31日的时序数据,剩余8个时序数据中均不存在数量突变点。那么在2个目标物品时序数据中,分别分隔出的训练用第一时序数据为2021年2月1日至2021年2月19日和2020年3月1日至2020年3月19日的数据,此时则将2021年2月1日至2021年2月19日和2020年3月1日至2020年3月19日的数据与剩余8个完整的时序数据进行聚类,并根据聚类的结果训练预设的第一类待训练模型,得到第一数据处理模型。Exemplarily, in the application scenario of sales forecasting, there are 10 time series data of all items in the test case, among which there are 2 time series data of the target item, that is, time series data with quantity mutation points. The time point corresponding to the set quantity mutation point is the 20th of each month, and the time series data of the target items is from February 1, 2021 to February 28, 2021 and from March 1, 2020 to March 31, 2020 Time-series data, there are no quantitative mutation points in the remaining 8 time-series data. Then, among the two time-series data of target items, the first time-series data for training are separated from February 1, 2021 to February 19, 2021 and March 1, 2020 to March 19, 2020. , at this time, the data from February 1, 2021 to February 19, 2021 and March 1, 2020 to March 19, 2020 will be clustered with the remaining 8 complete time series data, and based on the clustering As a result, the preset first type of model to be trained is trained to obtain the first data processing model.
由于在实际应用场景下,对于第一时序数据通常预测得到的结果相同或相近,因此不需要像第二时序数据一样进行分类处理,仅需要将所有的第一时序数据进行一次聚类,而不需要像第二时序数据一样聚类得到多个不同的时序簇,简化了具体的聚类过程,在不影响预测结果的前提下,减少数据处理量,提高数据处理效率。其中,具体的聚类方法与步骤S601和步骤S602相同。In practical application scenarios, the first time-series data usually predicts the same or similar results, so there is no need to perform classification processing like the second time-series data, and only need to cluster all the first time-series data once, without It needs to be clustered like the second time series data to obtain multiple different time series clusters, which simplifies the specific clustering process, reduces the amount of data processing and improves the efficiency of data processing without affecting the prediction results. Wherein, the specific clustering method is the same as step S601 and step S602.
在一种可选的实施例中,针对每一种特征时序簇,利用特征时序 簇对应的时序数据训练预设的第二类待训练模型,得到与特征时序簇相匹配的第二数据处理模型,其中,第二类待训练模型基于一个或多个基模型配置得到,第二类待训练模型与所述第一类待训练模型不同。相比于第一类待训练模型,由于第二类待训练模型分析的时序较为复杂,因此配置的基模型相对较多,通常利用多种基模型融合得到。In an optional embodiment, for each characteristic time-series cluster, use the time-series data corresponding to the characteristic time-series cluster to train the preset second type of model to be trained to obtain a second data processing model that matches the characteristic time-series cluster , wherein the second type of model to be trained is configured based on one or more base models, and the second type of model to be trained is different from the first type of model to be trained. Compared with the first type of model to be trained, because the time sequence of the analysis of the second type of model to be trained is more complex, there are relatively more base models configured, which are usually obtained by fusing multiple base models.
在确定第二时序数据所属的目标特征时序簇的过程中,在一种可选的实施例中,为所述第二时序数据选择出其所属目标特征时序簇的步骤,如图9所示,还包括:In the process of determining the target characteristic time-series cluster to which the second time-series data belongs, in an optional embodiment, the step of selecting the target characteristic time-series cluster for the second time-series data, as shown in FIG. 9 , Also includes:
步骤S901:对第二时序数据进行首尾拼接;Step S901: performing end-to-end splicing on the second time series data;
步骤S902:将拼接后的结果分别与一个或多个特征时序簇进行匹配,将匹配值最高的特征时序簇作为目标特征时序簇。Step S902: Match the spliced results with one or more feature time-series clusters, and use the feature time-series cluster with the highest matching value as the target feature time-series cluster.
示例性的,假设某一时序数据中,包括了2021年3月1日至2021年5月31日的物品需求量变化情况,其中数量突变点所在的设定时间点为月末前10天,也就是第二时序数据分别为2021年3月21日至2021年3月31日、2021年4月20日至2021年4月30日和2021年5月21日至2021年5月31日,其余时间段的时序数据为第一时序数据。那么首尾拼接后的时序数据即为将2021年3月21日至2021年3月31日、2021年4月20日至2021年4月30日和2021年5月21日至2021年5月31日进行首尾拼接,拼接成一条时序较长的新时序。由于存在高数量时序的数据通常具备共性的特征,将每一段高数量时序的数据分别输入模型,通常得到的结果相似或相同,因此通过首尾拼接,可以减少处理大量相似的数据,仅需要对拼接后的时序进行一次模型输入,即可得到准确的预测结果。As an example, assume that a certain time series data includes changes in demand for items from March 1, 2021 to May 31, 2021, and the set time point where the quantity mutation point is located is 10 days before the end of the month. That is, the second time series data are from March 21, 2021 to March 31, 2021, from April 20, 2021 to April 30, 2021, and from May 21, 2021 to May 31, 2021, and the rest The time-series data of the time period is the first time-series data. Then the time-series data after the end-to-end splicing is from March 21, 2021 to March 31, 2021, from April 20, 2021 to April 30, 2021, and from May 21, 2021 to May 31, 2021 Dates are spliced end to end to form a new time series with a longer time series. Since data with a high number of time series usually has common characteristics, inputting each piece of data with a high number of time series into the model will usually result in similar or identical results. Therefore, through end-to-end splicing, it is possible to reduce the processing of a large number of similar data, and only need to splicing Once the model input is performed in the subsequent time series, accurate prediction results can be obtained.
本申请实施例中,通过对相似时序的数据进行聚类分组训练可以更好地提取出聚类后时序簇的相关特征,可以降低模型的偏移和优化难度,但由于经过聚类后时序数量会有一定的减少,模型容易造成过拟合的现象。因此在一种可选的实施例中,可以通过引入数据增强的 方式减少过拟合现象,如图10所示,利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型的步骤中,还包括:In the embodiment of this application, by clustering and grouping training on similar time-series data, the relevant features of the clustered time-series clusters can be better extracted, and the model offset and optimization difficulty can be reduced. There will be a certain reduction, and the model is prone to overfitting. Therefore, in an optional embodiment, the overfitting phenomenon can be reduced by introducing data enhancement, as shown in Figure 10, using the time series data corresponding to the feature time series cluster to train the preset second type of model to be trained In the steps, also include:
步骤S1001:对特征时序簇包括的训练用第二时序数据进行剪裁,并对剪裁的结果进行拼接,生成新的训练用第二时序数据;Step S1001: clipping the second time-series data for training included in the feature time-series cluster, and splicing the clipped results to generate new second time-series data for training;
步骤S1002:利用特征时序簇包括的训练用第二时序数据和新的训练用第二时序数据,训练预设的第二类待训练模型。Step S1002: Using the second time-series data for training included in the feature time-series cluster and the new second time-series data for training, train a preset second type of model to be trained.
对于步骤S1001中的截取过程如图11所示,剪裁步骤具体可以包括:For the interception process in step S1001 as shown in Figure 11, the tailoring step may specifically include:
步骤S1001-1:设定长度阈值L,滑动长度d,裁剪长度s;Step S1001-1: set the length threshold L, the sliding length d, and the cutting length s;
步骤S1001-2:选取数据超过L的时序数据X l,对于未超过该阈值的时序不作处理; Step S1001-2: Select time-series data X l whose data exceeds L, and do not process the time-series data that does not exceed the threshold;
步骤S1001-3:对于X l中的时序做长度为s的滑窗截取,每个滑窗对应一个新样本; Step S1001-3: performing sliding window interception with a length of s for the time series in X1 , each sliding window corresponds to a new sample;
步骤S1001-4:将截取后的数据与未超过阈值的时序组合,作为新的样本集。Step S1001-4: Combine the intercepted data with the time series not exceeding the threshold as a new sample set.
通过上述的剪裁步骤可以在已有的时序数据的基础上,对样本集的数量进行扩充,使得预设的第二类训练模型在训练后得到更加准确的第二数据处理模型。Through the above clipping steps, the number of sample sets can be expanded on the basis of the existing time series data, so that the preset second-type training model can obtain a more accurate second data processing model after training.
通常情况下,第一类待训练模型用于训练训练用第一时序数据的聚类结果,仅需要从众多基模型中选取最优的一个基模型就可以达到训练目的,无需复杂繁琐的模型,可以有效提高效率。对于第二类待训练模型,通常为融合模型,即包含有多种基模型的复杂模型,可以根据不同的特征时序簇选取不同的基模型种类,以得到最优的训练模型。Usually, the first type of model to be trained is used to train the clustering results of the first time-series data for training. It only needs to select the optimal base model from many base models to achieve the training purpose, without complicated and cumbersome models. Can effectively improve efficiency. For the second type of model to be trained, it is usually a fusion model, that is, a complex model containing multiple base models. Different types of base models can be selected according to different feature time series clusters to obtain the optimal training model.
进一步地,对于多种基模型,可以根据不同时序簇的特征,对各个基模型设置权重,通过训练过程,不断优化各个基模型的权重,得 到仅保留权重较高的基模型。通过基模型的优化也可以减少时序数据过拟合的现象。Furthermore, for multiple base models, weights can be set for each base model according to the characteristics of different time series clusters, and through the training process, the weights of each base model can be continuously optimized to obtain only base models with higher weights. The optimization of the base model can also reduce the over-fitting phenomenon of time series data.
在一种可选的实施例中,可以将本公开实施例的方法用于商品的销量预测,如图12所示,其中,图12示出了根据时序数据预测销量的整体流程示意图,具体包括:In an optional embodiment, the method of the embodiment of the present disclosure can be used for product sales forecast, as shown in FIG. 12 , wherein, FIG. 12 shows a schematic diagram of an overall process for predicting sales based on time series data, specifically including :
步骤一:判断时序数据中是否有突变点,根据突变点判别结果确认时序数据是否存在高销现象;Step 1: Determine whether there is a sudden change point in the time series data, and confirm whether there is a high sales phenomenon in the time series data according to the judgment result of the sudden change point;
若存在高销现象,则执行步骤二,若不存在高销现象,则直接执行步骤七;If there is a phenomenon of high sales, go to step 2; if there is no phenomenon of high sales, go to step 7;
步骤二:对时序数据进行时序分割,以得到高销期间(第二时序数据)和非高销期间(第一时序数据);Step 2: Carry out time-series division of the time-series data to obtain high sales periods (second time-series data) and non-high-sales periods (first time-series data);
进一步地,对于高销期间的时序数据执行步骤三至步骤五,具体步骤如下:Further, step 3 to step 5 are performed for the time series data during the period of high sales, and the specific steps are as follows:
步骤三:将高销期间的时序数据与聚类结果中的特征时序簇进行匹配,得到目标特征时序簇;Step 3: Match the time-series data of the high-sales period with the characteristic time-series clusters in the clustering results to obtain the target characteristic time-series clusters;
步骤四:对目标特征时序簇中的高销时序数据进行时序拼接,得到高销时序数据的拼接结果;Step 4: Perform time-series splicing of the high-sales time-series data in the target feature time-series cluster to obtain the splicing result of the high-sales time-series data;
步骤五:以高销时序数据的拼接结果作为第一数据处理模型的输入,输出目标销量;Step 5: Take the splicing results of high-sales time series data as the input of the first data processing model, and output the target sales volume;
在步骤二之后,对于非高销期间的时序数据执行步骤六和步骤七,具体步骤如下:After Step 2, perform Step 6 and Step 7 for the time series data during the non-high sales period, the specific steps are as follows:
步骤六:对非高销期间的时序进行拼接,得到非高销时序数据的拼接结果;Step 6: Splicing the time series during the non-high sales period to obtain the splicing result of the non-high sales time series data;
步骤七:以非高销时序数据的拼接结果或当前全部时序数据作为第二数据处理模型的输入,输出目标销量。Step 7: The splicing result of non-high-selling time-series data or all current time-series data is used as the input of the second data processing model to output the target sales volume.
通过上述步骤,不仅可以将是否存在高销现象的时序数据进行分别处理,也可以将存在高销现象中的高销期间和非高销期间进行分别处理,提高数据处理的准确性。Through the above steps, not only can the time-series data whether there is a high sales phenomenon be processed separately, but also the high sales period and the non-high sales period in which the high sales phenomenon exists can be processed separately, so as to improve the accuracy of data processing.
本公开实施例的数据处理的方法,针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况进行分析,将待处理的时序数据分割成位于设定时间点之前的第一时序数据以及位于设定时间点之后的第二时序数据,其中,第一时序数据所包括的多个时间点的数据小于数量突变点的数值,第二时序数据所包括的多个时间点的数据等于或大于数量突变点的数值,后续通过第一数据处理模型和第二数据处理模型分别对第一时序数据和第二时序数据进行处理,即实现采用不同数据处理模型对不同阶段的不同数据进行处理,以使处理结果能够更真实反映不同阶段的物品需求情况,最后根据第一数据处理模型的第一处理结果以及第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于第一预测结果设置预测时序内的物品库存。通过上述方法可以达到通过采用不同的处理模型分别处理不同时序数据的目的,提高处理结果的准确性。The data processing method of the embodiment of the present disclosure analyzes the situation that the time-series data to be processed has a sudden change in quantity at a set time point within the historical target time period, and divides the time-series data to be processed into The first time-series data before the set time point and the second time-series data after the set time point, wherein the data of multiple time points included in the first time-series data is smaller than the value of the number of sudden changes, and the second time-series data includes The data at multiple time points included is equal to or greater than the value of the sudden change point, and then the first time-series data and the second time-series data are processed respectively through the first data processing model and the second data processing model, that is, different data processing methods are used. The model processes different data at different stages, so that the processing results can more truly reflect the demand for items at different stages, and finally, according to the first processing results of the first data processing model and the processing results of the second data processing model, the forecast sequence is determined The first forecast result of the demand situation of the item in the period, so as to set the inventory of the item in the forecast time series based on the first forecast result. Through the above method, the purpose of separately processing different time series data by adopting different processing models can be achieved, and the accuracy of processing results can be improved.
根据本公开一个或多个实施例的第二方面,提供一种应用于服务器的数据处理的装置。According to a second aspect of one or more embodiments of the present disclosure, a data processing apparatus applied to a server is provided.
图13是根据本公开一个或多个实施例的第二方面的数据处理装置1300的主要模块的示意图。如图13所示,包括:FIG. 13 is a schematic diagram of main modules of a data processing apparatus 1300 according to the second aspect of one or more embodiments of the present disclosure. As shown in Figure 13, including:
确定模块1301,用于确定待处理的时序数据,其中,所述待处理的时序数据指示历史目标时间段内物品需求量变化情况;A determining module 1301, configured to determine the time series data to be processed, wherein the time series data to be processed indicates the change of demand for items within the historical target time period;
预测模块1302,用于针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,将所述待处理的时序数据分割出位于所述设定时间点之前的第一时序数据以及位于所述设定时间点之后的第二时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;其中,所述数量突变点通过大数据统计学确定出; 根据所述第一数据处理模型的第一处理结果以及所述第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。The prediction module 1302 is configured to divide the time-series data to be processed into the time-series data that is located at the set The first time-series data before the time point and the second time-series data after the set time point; and determine the target characteristic time-series cluster to which the second time-series data belongs, and use the preset first data processing model to process the The first time-series data and processing the second time-series data by using a second data processing model matching the target feature time-series cluster; wherein, the number of mutation points is determined through big data statistics; according to the first data Processing the first processing result of the model and the processing result of the second data processing model, determining the first forecast result of the item demand situation in the forecast time series, so as to set the item inventory in the forecast time series based on the first forecast result .
在一种可选的实施例中,所述预测模块1302还用于,在所述确定待处理的时序数据之后,针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点不存在数量突变点的情况,利用预设的所述第一数据处理模型处理所述待处理的时序数据;根据所述第一数据处理模型的第二处理结果,确定预测时序内的物品需求情况的第二预测结果,以基于所述第二预测结果设置所述预测时序内的物品库存。In an optional embodiment, the prediction module 1302 is further configured to, after determining the time series data to be processed, set the time series data to be processed within the historical target time period. If there is no sudden change in quantity at the time point, use the preset first data processing model to process the time series data to be processed; determine the items in the forecast time series according to the second processing result of the first data processing model A second forecast result of the demand situation, so as to set the item inventory in the forecast time series based on the second forecast result.
在一种可选的实施例中,所述第一数据处理模型的第一处理结果包括:所述预测时序内设定时间点之前的物品需求情况的预测结果;In an optional embodiment, the first processing result of the first data processing model includes: a forecast result of the item demand situation before a set time point in the forecast time series;
所述第二处理模型的处理结果包括:所述预测时序内设定时间点之后的物品需求情况的预测结果;The processing result of the second processing model includes: the prediction result of the item demand situation after the set time point in the prediction time series;
所述预测模块1302还用于,将所述预测时序内设定时间点之前的物品需求情况的预测结果与所述预测时序内设定时间点之后的物品需求情况的预测结果拼接,得到所述第一预测结果。The forecasting module 1302 is further configured to splice the forecast result of the item demand situation before the set time point in the forecast time series with the forecast result of the item demand situation after the set time point in the forecast time series to obtain the The first predicted result.
在一种可选的实施例中,所述装置还包括分析模块,用于在所述确定待处理的时序数据之后,在分析所述待处理的时序数据满足正态性假设的情况下,利用Buishand统计方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点;In an optional embodiment, the device further includes an analysis module, configured to use The Buishand statistical method determines that the time-series data to be processed has a quantity mutation point at a set time point within the historical target time period or does not have a quantity mutation point;
在分析所述待处理的时序数据不满足正态性假设的情况下,利用Pettitt检验方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点。In the case that the time-series data to be processed does not satisfy the assumption of normality, the Pettitt test method is used to determine that the time-series data to be processed has a quantitative mutation point at a set time point within the historical target time period or does not There are quantitative mutation points.
在一种可选的实施例中,所述预测模块1302还用于,从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据;In an optional embodiment, the prediction module 1302 is also used to screen out the target item time series data with quantity mutation points from all the item time series data of the test case;
所述装置还包括聚类模块,用于从所述目标物品时序数据中分割出位于所述设定时间点之后的训练用第二时序数据训练用第二时序数据,其中,所述训练用第二时序数据所包括的多个时间点的数据等于或大于所述目标物品时序数据包括的数量突变点的数值;对所述训练用第二时序数据进行聚类,以得到一个或多个特征时序簇;The device further includes a clustering module, which is used to segment the second time-series data for training after the set time point from the time-series data of the target item, wherein the second time-series data for training is The data of multiple time points included in the second time-series data is equal to or greater than the value of the quantity mutation point included in the target item time-series data; the second time-series data for training is clustered to obtain one or more characteristic time-series cluster;
所述预测模块1302还用于,从一个或多个所述特征时序簇中,为所述第二时序数据选择出其所属目标特征时序簇。The prediction module 1302 is further configured to select a target characteristic time-series cluster for the second time-series data from one or more of the characteristic time-series clusters.
在一种可选的实施例中,所述预测模块1302还用于,针对所述特征时序簇有多个的情况,所述第二时序数据与各个所述特征时序簇进行匹配;根据匹配的结果,从多个所述特征时序簇筛选出目标特征时序簇。In an optional embodiment, the prediction module 1302 is further configured to match the second time-series data with each of the characteristic time-series clusters when there are multiple characteristic time-series clusters; As a result, a target characteristic time-series cluster is screened out from a plurality of said characteristic time-series clusters.
在一种可选的实施例中,所述聚类模块还用于,从所述目标物品时序数据中分割出位于所述设定时间点之前的训练用第一时序数据,其中,所述训练用第一时序数据包括的多个时间点的数据小于所述目标物品时序数据包括的数量突变点的数值;对所述测试用例中除所述目标物品时序数据之外的其他物品时序数据与所述训练用第一时序数据进行聚类;利用聚类的结果训练预设的第一类待训练模型,得到第一数据处理模型,其中,所述第一类待训练模型基于一个或多个基模型配置得到,所述基模型包括ARIMA、ETS、Croston、简单移动平均、FBProphet、Holt-Winters、一阶指数平滑中的一种或多种。In an optional embodiment, the clustering module is further configured to segment the first time-series data for training before the set time point from the time-series data of the target item, wherein the training The data of multiple time points included in the first time-series data is smaller than the value of the number of mutation points included in the target item time-series data; for the other item time-series data in the test case except the target item time-series data and the The training uses the first time-series data to perform clustering; the result of the clustering is used to train the preset first type of model to be trained to obtain a first data processing model, wherein the first type of model to be trained is based on one or more basic The model configuration is obtained, and the base model includes one or more of ARIMA, ETS, Croston, simple moving average, FBProphet, Holt-Winters, and first-order exponential smoothing.
在一种可选的实施例中,所述聚类模块还用于,针对每一种特征时序簇,利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型,得到与所述特征时序簇相匹配的第二数据处理模型,其中,所述第二类待训练模型基于一个或多个基模型配置得到,所述第二类待训练模型与所述第一类待训练模型不同。In an optional embodiment, the clustering module is further configured to, for each characteristic time series cluster, use the time series data corresponding to the feature time series cluster to train the preset second type of model to be trained, and obtain the same The second data processing model that matches the feature sequence cluster, wherein the second type of model to be trained is configured based on one or more base models, and the second type of model to be trained is the same as the first type of model to be trained The models are different.
在一种可选的实施例中,所述从测试用例全部的物品时序数据中, 筛选出具有数量突变点的目标物品时序数据,包括:In an optional embodiment, the screening out the target item time series data with quantitative mutation points from all the item time series data of the test case includes:
针对所述测试用例中的每一种仓库物品标识的物品时序数据,执行如下操作:确定数据分割点;在所述数据分割点为所述仓库物品标识的物品时序数据的数量突变点的情况,确定所述仓库物品标识的物品时序数据为所述目标物品时序数据。For the item time series data of each warehouse item identification in the test case, perform the following operations: determine the data segmentation point; in the case where the data division point is the quantity mutation point of the item time series data of the warehouse item identification, It is determined that the item time series data of the warehouse item identifier is the target item time series data.
在一种可选的实施例中,所述聚类模块还用于,为预设的距离函数确定序列平移函数和平移距离;利用确定出所述序列平移函数和所述平移距离的距离函数和DBSCAN算法对所述训练用第二时序数据进行聚类。In an optional embodiment, the clustering module is further configured to determine a sequence translation function and a translation distance for a preset distance function; using the determined sequence translation function and the distance function and the translation distance The DBSCAN algorithm clusters the second time series data for training.
在一种可选的实施例中,所述聚类模块还用于,对所述第二时序数据进行首尾拼接;将拼接后的结果分别与所述一个或多个特征时序簇进行匹配,将匹配值最高的特征时序簇作为所述目标特征时序簇。In an optional embodiment, the clustering module is further configured to perform end-to-end splicing on the second time-series data; match the spliced results with the one or more characteristic time-series clusters respectively, and The characteristic timing cluster with the highest matching value is used as the target characteristic timing cluster.
在一种可选的实施例中,所述聚类模块还用于,对所述特征时序簇包括的训练用第二时序数据进行剪裁,并对剪裁的结果进行拼接,生成新的训练用第二时序数据;利用所述特征时序簇包括的训练用第二时序数据和所述新的训练用第二时序数据,训练预设的第二类待训练模型。In an optional embodiment, the clustering module is further configured to trim the second time-series data for training included in the feature time-series cluster, and splicing the trimmed results to generate a new first time-series data for training. Two time series data: using the second time series data for training included in the feature time series cluster and the new second time series data for training to train a preset second type of model to be trained.
图14示出了可以应用本公开一个或多个实施例的数据处理的方法或数据处理装置的示例性系统架构1400。FIG. 14 shows an exemplary system architecture 1400 of a data processing method or a data processing apparatus to which one or more embodiments of the present disclosure can be applied.
如图14所示,系统架构1400可以包括终端设备1401、1402、1403,网络1404和服务器1405。网络1404用以在终端设备1401、1402、1403和服务器1405之间提供通信链路的介质。网络1404可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 14 , a system architecture 1400 may include terminal devices 1401 , 1402 , and 1403 , a network 1404 and a server 1405 . The network 1404 is used to provide a communication link medium between the terminal devices 1401 , 1402 , 1403 and the server 1405 . Network 1404 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备1401、1402、1403通过网络1404与服务 器1405交互,以发送任务执行请求或接收请求的响应信息等。终端设备1401、1402、1403上可以安装有各种通讯客户端应用,例如在线服务应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use the terminal devices 1401, 1402, 1403 to interact with the server 1405 through the network 1404 to send task execution requests or receive response information to requests, etc. Various communication client applications may be installed on the terminal devices 1401, 1402, and 1403, such as online service applications, web browser applications, search applications, instant messaging tools, email clients, and social platform software.
终端设备1401、1402、1403可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 1401, 1402, and 1403 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
服务器1405可以是提供各种服务的服务器,例如对用户利用终端设备1401、1402、1403所发送的在线服务请求提供支持的后台管理服务器、或者对数据进行处理的服务器。后台管理服务器可以对接收到的时序数据等数据进行分析等处理,并将处理结果(例如输出的物品需求情况的预测结果)反馈给终端设备。The server 1405 may be a server that provides various services, such as a background management server that provides support for online service requests sent by users using terminal devices 1401 , 1402 , and 1403 , or a server that processes data. The background management server can analyze and process the received time-series data and other data, and feed back the processing result (such as the output forecast result of the item demand) to the terminal device.
需要说明的是,本公开实施例第一方面所提供的数据处理的方法一般由服务器1405执行,相应地,本公开实施例第二方面所提供的数据处理的装置一般设置于服务器1405中。It should be noted that the data processing method provided by the first aspect of the embodiments of the present disclosure is generally executed by the server 1405 , and correspondingly, the data processing device provided by the second aspect of the embodiments of the present disclosure is generally set in the server 1405 .
应该理解,图15中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 15 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
下面参考图15,其示出了适于用来实现本公开实施例的终端设备的计算机系统1500的结构示意图。图15示出的终端设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 15 , it shows a schematic structural diagram of a computer system 1500 suitable for implementing a terminal device according to an embodiment of the present disclosure. The terminal device shown in FIG. 15 is only an example, and should not limit the functions and application scope of this embodiment of the present disclosure.
如图15所示,计算机系统1500包括中央处理单元(CPU)1501,其可以根据存储在只读存储器(ROM)1502中的程序或者从存储部分1508加载到随机访问存储器(RAM)1503中的程序而执行各种适当的动作和处理。在RAM 1503中,还存储有系统1500操作所需的各种程 序和数据。CPU 701、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(I/O)接口1505也连接至总线1504。As shown in FIG. 15 , a computer system 1500 includes a central processing unit (CPU) 1501, which can operate according to a program stored in a read-only memory (ROM) 1502 or a program loaded from a storage section 1508 into a random access memory (RAM) 1503 Instead, various appropriate actions and processes are performed. In the RAM 1503, various programs and data required for the operation of the system 1500 are also stored. The CPU 701, ROM 1502, and RAM 1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to the bus 1504 .
以下部件连接至I/O接口1505:包括键盘、鼠标等的输入部分1506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1505;包括硬盘等的存储部分1508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1510上,以便于从其上读出的计算机程序根据需要被安装入存储部分1508。The following components are connected to the I/O interface 1505: an input section 1506 including a keyboard, a mouse, etc.; an output section 1505 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 1508 including a hard disk, etc. and a communication section 1509 including a network interface card such as a LAN card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the Internet. A drive 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 1510 as necessary so that a computer program read therefrom is installed into the storage section 1508 as necessary.
特别地,根据本公开公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1509从网络上被下载和安装,和/或从可拆卸介质1511被安装。在该计算机程序被中央处理单元(CPU)1501执行时,执行本公开的系统中限定的上述功能。In particular, according to the disclosed embodiments of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the disclosed embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 1509 and/or installed from removable media 1511 . When this computer program is executed by a central processing unit (CPU) 1501, the above-mentioned functions defined in the system of the present disclosure are performed.
需要说明的是,本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、系统或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行 系统、系统或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、系统或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, system, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transport a program for use by or in conjunction with an instruction execution system, system, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括确定模块和预测模块。其中,这些模块的名称在某种情况下并不构成对该模块本身的限定,例如,确定模块还可以被描述为“用于确定待处理的时序数据的模块”。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. The described modules may also be set in a processor, for example, it may be described as: a processor includes a determination module and a prediction module. Wherein, the names of these modules do not constitute a limitation on the module itself under certain circumstances, for example, the determination module may also be described as "a module for determining the time series data to be processed".
作为另一方面,本公开还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独 存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:As another aspect, the present disclosure also provides a computer-readable medium, which may be included in the device described in the above-mentioned embodiments, or may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the device, the device includes:
确定待处理的时序数据;其中,所述待处理的时序数据指示在历史目标时间段内物品需求量变化情况;Determining the time series data to be processed; wherein, the time series data to be processed indicates changes in demand for items within a historical target time period;
针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,根据所述数量突变点,将所述待处理的时序数据分割出第二时序数据以及第一时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;For the situation that the time series data to be processed has a quantity mutation point at a set time point within the historical target time period, according to the quantity mutation point, the time series data to be processed is divided into second time series data and the first time-series data; and determining the target characteristic time-series cluster to which the second time-series data belongs, using a preset first data processing model to process the first time-series data and using the first time-series cluster matched with the target characteristic time-series cluster a second data processing model to process the second time series data;
根据所述第一数据处理模型的第一处理结果以及所述第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first prediction result of the item demand situation in the forecast time series, so as to set the Forecasting inventory of items within a time series.
本公开实施例的数据处理的方法和装置,针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况进行分析,将待处理的时序数据中的第二时序数据和第一时序数据分别通过第一数据处理模型和第二数据处理模型进行处理,最后根据第一数据处理模型的第一处理结果以及第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于第一预测结果设置预测时序内的物品库存。通过上述方法可以达到通过采用不同的处理模型分别处理不同时序数据的目的,提高处理结果的准确性。The data processing method and device of the embodiments of the present disclosure analyze the situation that the time-series data to be processed has a sudden change in quantity at the set time point within the historical target time period, and the time-series data to be processed The second time-series data and the first time-series data are respectively processed by the first data processing model and the second data processing model, and finally the prediction is determined according to the first processing result of the first data processing model and the processing result of the second data processing model The first forecast result of the item demand situation in the time series is used to set the forecast item inventory in the time series based on the first forecast result. Through the above method, the purpose of separately processing different time series data by adopting different processing models can be achieved, and the accuracy of processing results can be improved.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (15)

  1. 一种数据处理的方法,所述方法包括:A method of data processing, the method comprising:
    确定待处理的时序数据;其中,所述待处理的时序数据指示在历史目标时间段内物品需求量变化情况;Determining the time series data to be processed; wherein, the time series data to be processed indicates changes in demand for items within a historical target time period;
    针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,将所述待处理的时序数据分割出位于所述设定时间点之前的第一时序数据以及位于所述设定时间点之后的第二时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;所述数量突变点通过大数据统计学确定出;For the situation that the time series data to be processed has a sudden change in quantity at a set time point within the historical target time period, the time series data to be processed is divided into the first time point before the set time point. Time-series data and second time-series data located after the set time point; and determining the target characteristic time-series cluster to which the second time-series data belongs, using a preset first data processing model to process the first time-series data and using Processing the second time series data with a second data processing model matched with the time series cluster of the target feature; the number of mutation points is determined through big data statistics;
    根据所述第一数据处理模型的第一处理结果以及所述第二数据处理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first prediction result of the item demand situation in the forecast time series, so as to set the Forecasting inventory of items within a time series.
  2. 根据权利要求1所述的方法,其中,在所述确定待处理的时序数据之后,进一步包括:The method according to claim 1, wherein, after said determining the time series data to be processed, further comprising:
    针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点不存在数量突变点的情况,In view of the fact that the time series data to be processed does not have a quantity mutation point at a set time point within the historical target time period,
    利用预设的所述第一数据处理模型处理所述待处理的时序数据;Processing the time series data to be processed by using the preset first data processing model;
    根据所述第一数据处理模型的第二处理结果,确定预测时序内的物品需求情况的第二预测结果,以基于所述第二预测结果设置所述预测时序内的物品库存。According to the second processing result of the first data processing model, determine a second forecast result of the item demand situation in the forecast time series, so as to set the item inventory in the forecast time series based on the second forecast result.
  3. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述第一数据处理模型的第一处理结果包括:所述预测时序内设定时间点之前的物品需求情况的预测结果;The first processing result of the first data processing model includes: the forecast result of the item demand situation before the set time point in the forecast time series;
    所述第二处理模型的处理结果包括:所述预测时序内设定时间点之后的物品需求情况的预测结果;The processing result of the second processing model includes: the prediction result of the item demand situation after the set time point in the prediction time series;
    所述确定预测时序内的物品需求情况的第一预测结果,包括:The determination of the first forecast result of the item demand situation in the forecast time series includes:
    将所述预测时序内设定时间点之前的物品需求情况的预测结果与所述预测时序内设定时间点之后的物品需求情况的预测结果拼接,得到所述第一预测结果。The first prediction result is obtained by concatenating the forecast result of the item demand situation before the set time point in the forecast time series with the forecast result of the item demand situation after the set time point in the forecast time series.
  4. 根据权利要求2所述的方法,其中,在所述确定待处理的时序数据之后,进一步包括:The method according to claim 2, wherein, after said determining the time series data to be processed, further comprising:
    在分析所述待处理的时序数据满足正态性假设的情况下,利用Buishand统计方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点;In the case that the time series data to be processed satisfies the assumption of normality, the Buishand statistical method is used to determine that the time series data to be processed has a quantitative mutation point at a set time point within the historical target time period or does not exist Quantitative mutation point;
    在分析所述待处理的时序数据不满足正态性假设的情况下,In the case where the analysis of the time series data to be processed does not satisfy the normality assumption,
    利用Pettitt检验方法确定所述待处理的时序数据在位于所述历史目标时间段内设定时间点存在数量突变点或者不存在数量突变点。A Pettitt test method is used to determine whether there is a quantitative change point or no quantitative change point at a set time point within the historical target time period in the time series data to be processed.
  5. 根据权利要求1所述的方法,进一步包括:The method of claim 1, further comprising:
    从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据;From the time-series data of all items in the test case, screen out the time-series data of the target items with quantitative mutation points;
    从所述目标物品时序数据中分割出位于所述设定时间点之后的训练用第二时序数据训练用第二时序数据,其中,所述训练用第二时序数据所包括的多个时间点的数据等于或大于所述目标物品时序数据包括的数量突变点的数值;The second time-series data for training that is located after the set time point is segmented from the time-series data of the target item, wherein the second time-series data for training includes the multiple time-series data The data is equal to or greater than the value of the quantity mutation point included in the time series data of the target item;
    对所述训练用第二时序数据进行聚类,以得到一个或多个特征时序簇;clustering the second time-series data for training to obtain one or more characteristic time-series clusters;
    所述确定所述第二时序数据所属的目标特征时序簇,包括:The determining the target feature time series cluster to which the second time series data belongs includes:
    从一个或多个所述特征时序簇中,为所述第二时序数据选择出其所属目标特征时序簇。From one or more of the characteristic time-series clusters, select a target characteristic time-series cluster for the second time-series data.
  6. 根据权利要求5所述的方法,其中,所述为所述第二时序数据选择出其所属目标特征时序簇,包括:The method according to claim 5, wherein the selecting the target characteristic time series cluster for the second time series data comprises:
    针对所述特征时序簇有多个的情况,For the case where there are multiple characteristic timing clusters,
    将所述第二时序数据与各个所述特征时序簇进行匹配;matching the second time series data with each of the characteristic time series clusters;
    根据匹配的结果,从多个所述特征时序簇筛选出目标特征时序簇。According to the matching result, a target characteristic timing cluster is screened out from the plurality of characteristic timing clusters.
  7. 根据权利要求5所述的方法,进一步包括:The method of claim 5, further comprising:
    从所述目标物品时序数据中分割出位于所述设定时间点之前的训练用第一时序数据,其中,所述训练用第一时序数据包括的多个时间点的数据小于所述目标物品时序数据包括的数量突变点的数值;Segment the first time-series data for training before the set time point from the time-series data of the target item, wherein the data of multiple time points included in the first time-series data for training is smaller than the time-series data of the target item The value of the number of mutation points included in the data;
    对所述测试用例中除所述目标物品时序数据之外的其他物品时序数据与所述训练用第一时序数据进行聚类;Clustering the time series data of items other than the time series data of the target item in the test case and the first time series data for training;
    利用聚类的结果训练预设的第一类待训练模型,得到第一数据处理模型,其中,所述第一类待训练模型基于一个或多个基模型配置得到,所述基模型包括ARIMA、ETS、Croston、简单移动平均、FBProphet、Holt-Winters、一阶指数平滑中的一种或多种。Using the results of the clustering to train a preset first type of model to be trained to obtain a first data processing model, wherein the first type of model to be trained is obtained based on one or more base model configurations, and the base model includes ARIMA, One or more of ETS, Croston, Simple Moving Average, FBProphet, Holt-Winters, First-Order Exponential Smoothing.
  8. 根据权利要求7所述的方法,进一步包括:The method of claim 7, further comprising:
    针对每一种特征时序簇,利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型,得到与所述特征时序簇相匹配的第二数据处理模型,其中,所述第二类待训练模型基于一个或多个基模型配置得到,所述第二类待训练模型与所述第一类待训练模型不同。For each characteristic time-series cluster, use the time-series data corresponding to the characteristic time-series cluster to train a preset second type of model to be trained to obtain a second data processing model matching the characteristic time-series cluster, wherein the first The second type of model to be trained is configured based on one or more base models, and the second type of model to be trained is different from the first type of model to be trained.
  9. 根据权利要求5所述的方法,其中,所述从测试用例全部的物品时序数据中,筛选出具有数量突变点的目标物品时序数据,包括:The method according to claim 5, wherein, from all the item time series data of the test cases, screening out the target item time series data with quantitative mutation points comprises:
    针对所述测试用例中的每一种仓库物品标识的物品时序数据,执行如下操作:For the item time series data identified by each warehouse item in the test case, perform the following operations:
    确定数据分割点;Determine the data split point;
    在所述数据分割点为所述仓库物品标识的物品时序数据的数量突变点的情况,确定所述仓库物品标识的物品时序数据为所述目标物品时序数据。In a case where the data splitting point is a quantity mutation point of the item time-series data of the warehouse item identifier, it is determined that the item time-series data of the warehouse item identifier is the target item time-series data.
  10. 根据权利要求5所述的方法,其中,所述对所述训练用第二 时序数据进行聚类,包括:The method according to claim 5, wherein said clustering the second time series data for training comprises:
    为预设的距离函数确定序列平移函数和平移距离;Determine the sequence translation function and translation distance for a preset distance function;
    利用确定出所述序列平移函数和所述平移距离的距离函数和DBSCAN算法对所述训练用第二时序数据进行聚类。Clustering the second time series data for training is performed by using a distance function and a DBSCAN algorithm that determine the sequence translation function and the translation distance.
  11. 根据权利要求5所述的方法,其中,所述为所述第二时序数据选择出其所属目标特征时序簇,包括:The method according to claim 5, wherein the selecting the target characteristic time series cluster for the second time series data comprises:
    对所述第二时序数据进行首尾拼接;performing end-to-end splicing on the second time series data;
    将拼接后的结果分别与所述一个或多个特征时序簇进行匹配,将匹配值最高的特征时序簇作为所述目标特征时序簇。The spliced results are matched with the one or more characteristic time-series clusters respectively, and the characteristic time-series cluster with the highest matching value is used as the target characteristic time-series cluster.
  12. 根据权利要求8所述的方法,其中,所述利用所述特征时序簇对应的时序数据训练预设的第二类待训练模型,包括:The method according to claim 8, wherein said using the time series data corresponding to the characteristic time series cluster to train the preset second type of model to be trained comprises:
    对所述特征时序簇包括的训练用第二时序数据进行剪裁,并对剪裁的结果进行拼接,生成新的训练用第二时序数据;Clipping the second time-series data for training included in the feature time-series cluster, and splicing the clipped results to generate new second time-series data for training;
    利用所述特征时序簇包括的训练用第二时序数据和所述新的训练用第二时序数据,训练预设的第二类待训练模型。Using the second time-series data for training included in the feature time-series cluster and the new second time-series data for training, a preset second type of model to be trained is trained.
  13. 一种数据处理的装置,包括:A data processing device, comprising:
    确定模块,用于确定待处理的时序数据,其中,所述待处理的时序数据指示历史目标时间段内物品需求量变化情况;A determining module, configured to determine time series data to be processed, wherein the time series data to be processed indicates changes in demand for items within a historical target time period;
    预测模块,用于针对所述待处理的时序数据在位于所述历史目标时间段内的设定时间点存在数量突变点的情况,根据所述数量突变点,将所述待处理的时序数据分割出位于所述设定时间点之前的第一时序数据以及位于所述设定时间点之后的第二时序数据;并确定所述第二时序数据所属的目标特征时序簇,利用预设的第一数据处理模型处理所述第一时序数据以及利用与所述目标特征时序簇相匹配的第二数据处理模型处理所述第二时序数据;其中,所述数量突变点通过大数据统计学确定出;A forecasting module, configured to divide the time-series data to be processed according to the quantity mutation point when the time-series data to be processed has a quantity mutation point at a set time point within the historical target time period Output the first time-series data before the set time point and the second time-series data after the set time point; and determine the target feature time-series cluster to which the second time-series data belongs, using the preset first time-series data A data processing model processes the first time-series data and processes the second time-series data using a second data processing model that matches the target feature time-series cluster; wherein, the number of sudden changes is determined through big data statistics;
    根据所述第一数据处理模型的第一处理结果以及所述第二数据处 理模型的处理结果,确定预测时序内的物品需求情况的第一预测结果,以基于所述第一预测结果设置所述预测时序内的物品库存。According to the first processing result of the first data processing model and the processing result of the second data processing model, determine the first prediction result of the item demand situation in the forecast time series, so as to set the Forecasting inventory of items within a time series.
  14. 一种数据处理的设备,包括:一个或多个处理器;A data processing device comprising: one or more processors;
    存储系统,用于存储一个或多个程序,a storage system for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-12中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-12.
  15. 一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1-12中任一所述的方法。A computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the method according to any one of claims 1-12 is realized.
PCT/CN2022/118700 2022-02-17 2022-09-14 Data processing method and apparatus WO2023155426A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210144038.0 2022-02-17
CN202210144038.0A CN114219545B (en) 2022-02-17 2022-02-17 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2023155426A1 true WO2023155426A1 (en) 2023-08-24

Family

ID=80709274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118700 WO2023155426A1 (en) 2022-02-17 2022-09-14 Data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN114219545B (en)
WO (1) WO2023155426A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196121A (en) * 2023-10-26 2023-12-08 广东省信息网络有限公司 Data analysis method and system based on prediction system
CN117807411A (en) * 2024-02-29 2024-04-02 济南浪潮数据技术有限公司 Server performance index prediction method and device and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219545B (en) * 2022-02-17 2022-07-05 北京京东振世信息技术有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200098055A1 (en) * 2018-09-25 2020-03-26 Business Objects Software Ltd. Multi-step day sales outstanding forecasting
CN113743971A (en) * 2020-06-17 2021-12-03 北京沃东天骏信息技术有限公司 Data processing method and device
CN113780611A (en) * 2020-12-10 2021-12-10 北京沃东天骏信息技术有限公司 Inventory management method and device
CN114219545A (en) * 2022-02-17 2022-03-22 北京京东振世信息技术有限公司 Data processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401609B (en) * 2020-03-04 2023-01-17 平安科技(深圳)有限公司 Prediction method and prediction device for traffic flow time series
CN111860865B (en) * 2020-07-23 2022-07-19 中国工商银行股份有限公司 Model construction and analysis method, device, electronic equipment and medium
CN113128932B (en) * 2021-04-16 2024-04-16 北京京东振世信息技术有限公司 Warehouse stock processing method and device, storage medium and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200098055A1 (en) * 2018-09-25 2020-03-26 Business Objects Software Ltd. Multi-step day sales outstanding forecasting
CN113743971A (en) * 2020-06-17 2021-12-03 北京沃东天骏信息技术有限公司 Data processing method and device
CN113780611A (en) * 2020-12-10 2021-12-10 北京沃东天骏信息技术有限公司 Inventory management method and device
CN114219545A (en) * 2022-02-17 2022-03-22 北京京东振世信息技术有限公司 Data processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196121A (en) * 2023-10-26 2023-12-08 广东省信息网络有限公司 Data analysis method and system based on prediction system
CN117807411A (en) * 2024-02-29 2024-04-02 济南浪潮数据技术有限公司 Server performance index prediction method and device and electronic equipment

Also Published As

Publication number Publication date
CN114219545B (en) 2022-07-05
CN114219545A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
WO2023155426A1 (en) Data processing method and apparatus
CN109697522B (en) Data prediction method and device
US20210319366A1 (en) Method, apparatus and device for generating model and storage medium
WO2019072128A1 (en) Object identification method and system therefor
CN107392259B (en) Method and device for constructing unbalanced sample classification model
CN110647522A (en) Data mining method, device and system
CN107908616B (en) Method and device for predicting trend words
CN107885783B (en) Method and device for obtaining high-correlation classification of search terms
CN111931055B (en) Object recommendation method, object recommendation device and electronic equipment
CN112257868A (en) Method and device for constructing and training integrated prediction model for predicting passenger flow
CN112418258A (en) Feature discretization method and device
US10460276B2 (en) Predictive model search by communicating comparative strength
CN109951859B (en) Wireless network connection recommendation method and device, electronic equipment and readable medium
CN110766488A (en) Method and device for automatically determining theme scene
CN112115710A (en) Industry information identification method and device
CN113296836B (en) Method for training model, test method, device, electronic equipment and storage medium
CN113850072A (en) Text emotion analysis method, emotion analysis model training method, device, equipment and medium
CN114036391A (en) Data pushing method and device, electronic equipment and storage medium
CN112990311A (en) Method and device for identifying admitted client
CN113792952A (en) Method and apparatus for generating a model
CN113612777A (en) Training method, traffic classification method, device, electronic device and storage medium
CN112784861A (en) Similarity determination method and device, electronic equipment and storage medium
CN112906723A (en) Feature selection method and device
CN110895564A (en) Potential customer data processing method and device
CN112862554A (en) Order data processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22926731

Country of ref document: EP

Kind code of ref document: A1