WO2020093718A1 - Training data re-sampling method and apparatus, and storage medium and electronic device - Google Patents

Training data re-sampling method and apparatus, and storage medium and electronic device Download PDF

Info

Publication number
WO2020093718A1
WO2020093718A1 PCT/CN2019/094741 CN2019094741W WO2020093718A1 WO 2020093718 A1 WO2020093718 A1 WO 2020093718A1 CN 2019094741 W CN2019094741 W CN 2019094741W WO 2020093718 A1 WO2020093718 A1 WO 2020093718A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
sorting result
sorting
categories
ratio
Prior art date
Application number
PCT/CN2019/094741
Other languages
French (fr)
Chinese (zh)
Inventor
李伟健
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020093718A1 publication Critical patent/WO2020093718A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed are a training data re-sampling method and apparatus, and a storage medium and an electronic device. The method comprises: acquiring first original data within a first time period (S101); calculating respective first proportions of multiple pre-set classifications in the first original data (S102); sorting the multiple pre-set classifications according to a size relationship of the first proportions and a pre-set rule so as to obtain a first sorting result (S103); determining, according to the first sorting result of the pre-set classifications and a pre-set correlation, a sampling proportion corresponding to each pre-set classification (S104), wherein the pre-set correlation is a correlation between the first sorting result and the sampling proportion; and re-sampling training data for modeling according to the sampling proportions respectively corresponding to the multiple pre-set classifications (S105). The problem of a classification model being unfriendly to a small category can be solved, and the classification accuracy of a classification model, obtained through training by means of training data, for different applications can be improved, thereby improving user experience.

Description

训练数据重采样方法、装置、存储介质及电子设备Training data resampling method, device, storage medium and electronic equipment
相关申请的交叉引用Cross-reference of related applications
本申请要求2018年11月8日在中国知识产权局提交的中国专利申请No.201811327417.3的优先权,通过引用将该中国专利申请公开的全部内容并入本文。This application requires the priority of China Patent Application No. 201811327417.3 filed at the China Intellectual Property Office on November 8, 2018. The entire contents of the disclosure of this Chinese patent application are incorporated herein by reference.
技术领域Technical field
本公开涉及数据挖掘领域,具体地,涉及一种训练数据重采样方法、装置、存储介质及电子设备。The present disclosure relates to the field of data mining, and in particular, to a training data resampling method, device, storage medium, and electronic equipment.
背景技术Background technique
在机器学习中,训练数据中的针对分类模型中不同分类的样本数目可能出现相差巨大的情况,例如,在N个训练数据中,属于第一类的样本数目可能与属于第二类的样本数目以及属于第三类的样本数目等都相差巨大(例如属于第一类的样本数目可能占到N个训练数据中的90%,属于第二类和第三类的样本数目可能一共才占到N个训练数据中的10%),这样直接用样本数目不均衡的训练数据去对分类模型进行训练时,机器学习算法倾向于产生不太令人满意的分类模型,例如可能会造成分类模型对于训练数据中样本数目较少的分类欠拟合、对训练模型中样本数量较多的分类过拟合的情况,实际上,如果不均衡比例超过4∶1,分类模型就会偏向于大的类别而忽略小的类别。因此,用没有处理过的不均衡的训练数据训练出的分类模型可能对实际数据的分类效果并不理想。目前,针对训练数据不均衡的情况,通常会采用对训练数据进行重采样的方法。In machine learning, the number of samples in the training data for different classifications in the classification model may differ greatly. For example, in N training data, the number of samples belonging to the first category may be different from the number of samples belonging to the second category And the number of samples belonging to the third category is very different (for example, the number of samples belonging to the first category may account for 90% of the N training data, and the number of samples belonging to the second category and the third category may account for N together. 10% of the training data), so when directly using the training data with an uneven number of samples to train the classification model, the machine learning algorithm tends to produce a less satisfactory classification model, for example, it may cause the classification model for training When the classification with a small number of samples in the data is under-fitting and over-fitting the classification with a large number of samples in the training model, in fact, if the imbalance ratio exceeds 4: 1, the classification model will be biased towards the larger category. Ignore small categories. Therefore, the classification model trained with unprocessed unbalanced training data may not be ideal for the classification of actual data. At present, in view of the uneven training data, the method of resampling the training data is usually adopted.
发明内容Summary of the invention
本公开的目的是提供一种训练数据重采样方法、装置、存储介质及电子设 备,能够针对训练数据不均衡的情况,根据实际的原始数据中不同分类所占的比例来对训练数据进行重采样处理,从而解决分类模型对小类别不友好的问题。The purpose of the present disclosure is to provide a training data resampling method, device, storage medium, and electronic equipment, which can resample training data according to the proportion of different classifications in the actual original data in the case of training data imbalance Processing to solve the problem that the classification model is not friendly to small categories.
为了实现上述目的,本公开提供一种训练数据重采样方法,所述方法包括:In order to achieve the above objective, the present disclosure provides a training data resampling method. The method includes:
获取第一时段内的第一原始数据;Obtain the first raw data in the first period;
计算所述第一原始数据中多个预设分类分别所占的第一比例;Calculating a first proportion respectively occupied by multiple preset categories in the first original data;
根据所述第一比例的大小关系按照预设规则对所述多个预设分类进行排序,获得第一排序结果;Sorting the plurality of preset categories according to a preset rule according to the size relationship of the first ratio, to obtain a first sorting result;
根据各预设分类的第一排序结果和预设对应关系,确定各预设分类对应的采样比例,所述预设对应关系为所述第一排序结果与所述采样比例之间的对应关系;Determining a sampling ratio corresponding to each preset category according to the first sorting result of each preset category and a preset correspondence, where the preset correspondence is the correspondence between the first sorting result and the sampling ratio;
根据所述多个预设分类分别对应的所述采样比例对用于建模的训练数据进行重采样。Re-sampling the training data used for modeling according to the sampling ratios corresponding to the multiple preset categories, respectively.
可选地,所述方法还包括:在获得所述第一排序结果之后,Optionally, the method further includes: after obtaining the first sorting result,
获取第二时段内的第二原始数据;Obtain the second raw data in the second period;
计算所述第二原始数据中所述多个预设分类分别所占的第二比例;Calculating a second proportion respectively occupied by the plurality of preset categories in the second original data;
根据所述第二比例的大小关系按照所述预设规则对所述多个预设分类进行排序,获得第二排序结果;Sorting the plurality of preset categories according to the size relationship of the second ratio according to the preset rule, to obtain a second sorting result;
在所述第一排序结果和所述第二排序结果一致时,执行所述确定各预设分类对应的采样比例的步骤。When the first sorting result is consistent with the second sorting result, the step of determining the sampling ratio corresponding to each preset classification is performed.
可选地,在所述第一排序结果和所述第二排序结果不一致时,重新确定所述第二时段,并将所述第二排序结果确定为第一排序结果;返回所述获取第二时段内的第二原始数据的步骤。Optionally, when the first sorting result and the second sorting result are inconsistent, re-determine the second time period, and determine the second sorting result as the first sorting result; return to obtain the second Step of the second raw data in the period.
可选地,当至少两个预设分类的比例相同时,按照所述至少两个预设分类的优先级确定所述至少两个预设分类的排序。Optionally, when the proportions of the at least two preset categories are the same, the order of the at least two preset categories is determined according to the priorities of the at least two preset categories.
本公开还提供一种训练数据重采样装置,所述装置包括:The present disclosure also provides a training data resampling device. The device includes:
第一获取模块,用于获取第一时段内的第一原始数据;The first obtaining module is used to obtain the first original data in the first period;
第一计算模块,用于计算所述第一原始数据中多个预设分类分别所占的第一比例;A first calculation module, configured to calculate a first proportion respectively occupied by multiple preset categories in the first original data;
第一排序模块,用于根据所述第一比例的大小关系按照预设规则对所述多个预设分类进行排序,获得第一排序结果;A first sorting module, configured to sort the plurality of preset categories according to a preset rule according to the size relationship of the first ratio, to obtain a first sorting result;
比例获取模块,用于根据各预设分类的所述第一排序结果和预设对应关系,确定各预设分类对应的采样比例,所述预设对应关系为所述第一排序结果与所述采样比例之间的对应关系;A ratio obtaining module, configured to determine a sampling ratio corresponding to each preset category according to the first sorting result of each preset category and a preset correspondence, the preset correspondence between the first sorting result and the Correspondence between sampling ratios;
重采样模块,用于根据所述多个预设分类分别对应的所述采样比例对用于建模的训练数据进行重采样。The re-sampling module is used to re-sample the training data used for modeling according to the sampling ratios respectively corresponding to the multiple preset categories.
可选地,所述装置还包括:Optionally, the device further includes:
第二获取模块,用于在获得所述第一排序结果之后,获取第二时段内的第二原始数据;A second obtaining module, configured to obtain the second original data in the second period after obtaining the first sorting result;
第二计算模块,用于计算所述第二原始数据中所述多个预设分类分别所占的第二比例;A second calculation module, configured to calculate a second proportion respectively occupied by the plurality of preset categories in the second original data;
第二排序模块,用于根据所述第二比例的大小关系按照所述预设规则对所述多个预设分类进行排序,获得第二排序结果;A second sorting module, configured to sort the plurality of preset classifications according to the preset ratio according to the size relationship of the second ratio, to obtain a second sorting result;
排名比较模块,用于在所述第一排序结果和所述第二排序结果一致时,触发所述比例获取模块确定各预设分类对应的所述采样比例。The ranking comparison module is configured to trigger the ratio acquisition module to determine the sampling ratio corresponding to each preset category when the first ranking result is consistent with the second ranking result.
可选地,所述排名比较模块还被配置成:Optionally, the ranking comparison module is further configured to:
在所述第一排序结果和所述第二排序结果不一致时,重新确定所述第二时段,并将所述第二排序结果确定为第一排序结果;When the first sorting result and the second sorting result are inconsistent, re-determine the second time period, and determine the second sorting result as the first sorting result;
触发所述第二获取模块获取第二时段内的第二原始数据。The second acquisition module is triggered to acquire the second original data in the second period.
可选地,当至少两个预设分类的比例相同时,按照所述至少两个预设分类的优先级确定所述至少两个预设分类的排序。Optionally, when the proportions of the at least two preset categories are the same, the order of the at least two preset categories is determined according to the priorities of the at least two preset categories.
本公开还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述方法。The present disclosure also provides a computer-readable storage medium on which a computer program is stored, which implements the above method when the program is executed by a processor.
本公开还提供一种电子设备,其包括:The present disclosure also provides an electronic device, which includes:
存储器,其上存储有计算机程序;Memory, on which computer programs are stored;
处理器,用于执行所述存储器中的所述计算机程序,以实现上述方法。The processor is configured to execute the computer program in the memory to implement the above method.
根据本公开的实施例,能够针对训练数据不均衡的情况,根据实际的原始数据中不同分类所占的比例来对训练数据进行重采样处理,从而解决分类模 型对小类别不友好的问题,提高通过该训练数据训练得到的分类模型针对不同应用的分类准确性,从而提高用户体验。According to the embodiments of the present disclosure, the training data can be resampled according to the proportion of different classifications in the actual original data in the case of imbalanced training data, thereby solving the problem that the classification model is not friendly to small categories, improving The classification model trained by the training data has classification accuracy for different applications, thereby improving user experience.
附图说明BRIEF DESCRIPTION
附图是用来帮助对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The drawings are used to help further understand the present disclosure, and constitute a part of the specification, together with the following specific embodiments to explain the present disclosure, but do not constitute a limitation of the present disclosure. In the drawings:
图1是根据本公开的示例性实施例的一种训练数据重采样方法的流程图。FIG. 1 is a flowchart of a training data resampling method according to an exemplary embodiment of the present disclosure.
图2是根据本公开的示例性实施例的又一训练数据重采样方法的流程图。FIG. 2 is a flowchart of still another training data resampling method according to an exemplary embodiment of the present disclosure.
图3是根据本公开的示例性实施例的一种训练数据重采样装置的结构框图。FIG. 3 is a structural block diagram of a training data resampling device according to an exemplary embodiment of the present disclosure.
图4是根据本公开的示例性实施例的又一训练数据重采样装置的结构框图。FIG. 4 is a structural block diagram of yet another training data resampling device according to an exemplary embodiment of the present disclosure.
图5是根据本公开的示例性实施例的一种电子设备的框图。FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式detailed description
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。The specific embodiments of the present disclosure will be described in detail below with reference to the drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.
由于在建立分类模型时经常会遇到训练数据不均衡的情况,而根据分类不均衡的训练数据训练出来的分类模型经常会偏向占比较多的类别,对占比较多的类别可能会出现过拟合,而对占比较少的类别可能会出现欠拟合的情况,因此,本领域技术人员在建立分类模型时,通常会制定诸如提升分类算法或平衡训练数据的类(数据预处理)的策略,其中,后者因为应用范围广泛而更常使用,即对不均衡的训练数据进行重采样的处理。As the classification model is often encountered when the training data is uneven, and the classification model trained based on the imbalanced training data often biases the categories that occupy more, and the categories that account for more may appear to be over However, under-fitting may occur in categories that occupy a relatively small amount. Therefore, when a classification model is established, those skilled in the art usually formulate strategies such as improving classification algorithms or balancing training data (data preprocessing) Among them, the latter is more commonly used because of its wide range of applications, that is, resampling of unbalanced training data.
根据重采样的原理,通常将重采样分为过采样(over-sampling)和欠采样(under-sampling),过采样即通过增加训练数据中占比较少的类别的训练数据的数量,从而增加训练数据中少数类的代表性,欠采样即通过减少训练数据中占比较多的类别的训练数据的数量,从而减少训练数据中多数类的代表性,以此,来达到平衡训练数据中不同比例的类别的数据数量,解决训练数据不平衡 的问题。重采样是采用的方法通常会为最邻近法、双线性内插法以及三次卷积内插法这三种。According to the principle of re-sampling, re-sampling is usually divided into over-sampling and under-sampling. Over-sampling is to increase training by increasing the number of training data that occupy less categories in the training data, thereby increasing training Representation of a few classes in the data, undersampling means to reduce the representativeness of the majority of classes in the training data by reducing the amount of training data that accounts for more classes in the training data, in order to balance different proportions in the training data The number of categories of data to solve the imbalance of training data. The resampling method is usually three methods: nearest neighbor method, bilinear interpolation method and cubic convolution interpolation method.
由此可见,根据不同领域对训练数据重采样需求的不同,相同的重采样方法对训练数据处理的效果也不尽相同。因此,如何根据不同领域的分类分布的实际情况来确定如何对不均衡训练数据进行重采样,是亟需解决的重要问题。It can be seen that, according to the different needs of training data resampling in different fields, the same resampling method has different effects on training data processing. Therefore, how to determine how to resample unbalanced training data according to the actual situation of classification distribution in different fields is an important problem that needs to be solved urgently.
因此,本公开提出了一种训练数据重采样方法。Therefore, the present disclosure proposes a training data resampling method.
图1是根据本公开的示例性实施例的一种训练数据重采样方法的流程图,如图1所示,该方法可以包括步骤S101至步骤S105。FIG. 1 is a flowchart of a training data resampling method according to an exemplary embodiment of the present disclosure. As shown in FIG. 1, the method may include steps S101 to S105.
在步骤S101中,获取第一时段内的第一原始数据。原始数据是指分类模型所适用的实际应用中产生的实际数据,例如,对在线小视频进行分类的分类模型所应用的在线小视频应用中实际产生的数据,第一时段内的第一原始数据可以为例如在2018年10月15日0:00到2018年10月16日0:00之间用户实际上传的小视频数据。In step S101, the first raw data in the first period is acquired. Raw data refers to the actual data generated in the actual application to which the classification model applies, for example, the data actually generated in the online small video application applied to the classification model that classifies the online small video, the first raw data in the first period It may be, for example, small video data actually uploaded by the user between 0:00 on October 15, 2018 and 0:00 on October 16, 2018.
在步骤S102中,计算第一原始数据中多个预设分类分别所占的第一比例。在获取到该第一原始数据之后,对该第一原始数据中多个预设分类分别所占的第一比例进行计算,该预设分类即为人为设定好的不同的分类,例如风景类、宠物类、舞蹈类、技术类、搞笑类等等。步骤S102中即可以得到在该第一时段内获取到的第一原始数据中不同预设分类分别所占的第一比例。In step S102, the first proportions respectively occupied by multiple preset categories in the first original data are calculated. After obtaining the first original data, calculate the first proportions respectively occupied by a plurality of preset categories in the first original data, and the preset categories are different categories that are artificially set, such as landscape categories , Pets, dance, technology, funny, etc. In step S102, a first proportion of different preset categories in the first original data acquired during the first period can be obtained.
在步骤S103中,根据第一比例的大小关系按照预设规则对多个预设分类进行排序,获得第一排序结果(即,获得各个预设分类的排名)。In step S103, multiple preset categories are sorted according to preset rules according to the magnitude relationship of the first ratio, and a first sorting result is obtained (that is, a ranking of each preset category is obtained).
在步骤S104中,根据各预设分类的排名和预设对应关系,确定各预设分类对应的采样比例,该预设对应关系为排名与采样比例之间的对应关系。预设规则可以为从大到小,或者从小到大等,该预设规则应该与该预设对应关系互相配合,从而获取能够反映该第一原始数据中各个预设分类实际分布情况的采样比例。例如,在步骤S103中可以按照第一比例从大到小的顺序对该多个预设分类进行排序,该预设对应关系可以为例如表1所示:In step S104, the sampling ratio corresponding to each preset category is determined according to the ranking and preset correspondence of each preset category, and the preset correspondence is the correspondence between ranking and sampling ratio. The preset rule may be from large to small, or from small to large, etc., and the preset rule should be coordinated with the corresponding relationship of the preset, so as to obtain a sampling ratio that can reflect the actual distribution of each preset category in the first original data . For example, in step S103, the multiple preset categories may be sorted according to the order of the first ratio from large to small, and the preset corresponding relationship may be, for example, as shown in Table 1:
表1Table 1
排名Rank 11 22 33 44 55
采样比例Sampling ratio 50%50% 20%20% 10%10% 10%10% 10%10%
其中,该预设分类不限于如表1中所示的5类,可以为其他任意个数。只要每个预设分类所对应的采样比例的总和为100%即可。由此,如表1所示,5个预设分类就能够分别得到与之相对应的采样比例。The preset category is not limited to the five categories shown in Table 1, but may be any other number. As long as the sum of the sampling ratios corresponding to each preset category is 100%. Therefore, as shown in Table 1, the five preset categories can respectively obtain the sampling ratio corresponding to them.
在步骤S105中,根据多个预设分类分别对应的采样比例对用于建模的训练数据进行重采样。在通过步骤S101至步骤S104得到第一时段的第一原始数据中不同预设分类分别对应的采样比例之后,就可以根据该采样比例对训练数据进行重采样了,本公开中对重采样采用的上采样还是下采样,以及具体采用何种采样方法不做限制,只要是根据图1中所示的步骤S101至步骤S104而得到的各个预设分类的采样比例进行重采样即可。In step S105, the training data used for modeling is re-sampled according to sampling ratios corresponding to multiple preset categories, respectively. After obtaining the sampling ratios corresponding to the different preset categories in the first raw data of the first time period through steps S101 to S104, the training data can be resampled according to the sampling ratio. Upsampling or downsampling, and the specific sampling method used are not limited, as long as re-sampling is performed according to the sampling ratio of each preset category obtained from step S101 to step S104 shown in FIG. 1.
根据本公开的实施例,能够针对训练数据不均衡的情况,根据实际的原始数据中不同分类所占的比例来对训练数据进行重采样处理,从而解决分类模型对小类别不友好的问题,提高通过该训练数据训练得到的分类模型针对不同应用的分类准确性,从而提高用户体验。According to the embodiments of the present disclosure, the training data can be resampled according to the proportion of different classifications in the actual original data in the case of imbalanced training data, thereby solving the problem that the classification model is not friendly to small categories, improving The classification model trained by the training data has classification accuracy for different applications, thereby improving user experience.
图2是根据本公开的示例性实施例的又一训练数据重采样方法的流程图。如图2所示,该方法除了包括图1中所示的步骤S101至步骤S105之外,还可以包括步骤S201至步骤S205。FIG. 2 is a flowchart of still another training data resampling method according to an exemplary embodiment of the present disclosure. As shown in FIG. 2, the method may include steps S201 to S205 in addition to steps S101 to S105 shown in FIG. 1.
在步骤S201中,获取第二时段内的第二原始数据。本公开中该第二时段与第一时段不相同,但允许第二时段和第一时段之间有重叠的时间,而且没有固定的先后顺序关系,即第一时段在第二时段之前或之后都可,但在先的时段的结束时间与在后的时段的开始时间之间的间隔需要小于预设阈值。该预设阈值可以为例如24小时、48小时等,这样能够避免第一时段与第二时段选取的间隔过大,所对应的第一原始数据和第二原始数据的数据内容变化过大,从而导致影响其中的各个预设分类所占比例的问题。其中,该第二原始数据中对原始数据的定义与上述第一原始数据中对原始数据的定义相同。In step S201, the second raw data in the second period is acquired. In the present disclosure, the second time period is different from the first time period, but there is overlap between the second time period and the first time period, and there is no fixed sequence relationship, that is, the first time period is before or after the second time period Yes, but the interval between the end time of the previous period and the start time of the later period needs to be less than a preset threshold. The preset threshold may be, for example, 24 hours, 48 hours, etc., so as to avoid that the interval selected between the first period and the second period is too large, and the data content of the corresponding first original data and second original data changes too much, thereby Causes problems that affect the proportion of each of the preset categories. The definition of the original data in the second original data is the same as the definition of the original data in the first original data.
在步骤S202中,计算第二原始数据中多个预设分类分别所占的第二比例。In step S202, the second proportions respectively occupied by multiple preset categories in the second original data are calculated.
在步骤S203中,根据第二比例的大小关系按照预设规则对多个预设分类进行排序,获得第二排序结果。上述步骤S202和步骤S203与图1中所示的步骤S102和步骤S103中的步骤相似,步骤S102和步骤S103是根据在第一 原始数据中的各个预设分类所占的比例确定各个预设分类所对应的采样比例,步骤S202和步骤S203是根据在第二原始数据中的各个预设分类所占的比例确定各个预设分类所对应的采样比例,分别得到了第一排序结果和第二排序结果。In step S203, a plurality of preset categories are sorted according to a preset rule according to the magnitude relationship of the second ratio, and a second sorting result is obtained. The above steps S202 and S203 are similar to the steps in step S102 and step S103 shown in FIG. 1, and step S102 and step S103 determine each preset category according to the proportion of each preset category in the first raw data Corresponding sampling ratio, step S202 and step S203 determine the sampling ratio corresponding to each preset category according to the proportion of each preset category in the second raw data, and obtain the first ranking result and the second ranking respectively result.
在步骤S204中,对在步骤S103和步骤S203中分别得到的第一排序结果和第二排序结果进行比较。如果二者一致,则表示该第一排序结果和第二排序结果是准确的,可以用于确定各个预设分类的不同采样比例,因此转至步骤S104来确定个预设分类对应的采样比例,最后执行步骤S105来根据各个预设分类对应的采样比例来对用于建模的训练数据进行重采样,以解决由于训练数据不均衡导致的分类模型效果的问题。而如果二者不一致,则转至步骤S205。In step S204, the first sorting result and the second sorting result respectively obtained in step S103 and step S203 are compared. If the two are consistent, it means that the first sorting result and the second sorting result are accurate and can be used to determine different sampling ratios of each preset category, so go to step S104 to determine the sampling ratio corresponding to each preset category, Finally, step S105 is executed to resample the training data used for modeling according to the sampling ratio corresponding to each preset classification to solve the problem of the effect of the classification model due to the imbalance of the training data. And if the two do not match, then go to step S205.
在步骤S205中,重新确定第二时段,并将第二排序结果确定为第一排序结果。其中,重新确定的第二时段,与在步骤S201中使用的第二时段不相同,重新确定的第二时段与在步骤S201中使用的第二时段之间的关系的限定,与在步骤S201中使用的第二时段与第一时段的关系相似,都允许二者之间有重叠的时间,而且没有固定的先后顺序关系,另外,重新确定的第二时段也不能与第一时段以及之前所有曾经被确定为第二时段的时段相同,这样,保证了每一次获取原始数据进行比例计算以及排序时,都能保证选取的内容能够真实反映实际应用中不同预设分类所占的比例,且不重复。在重新确定了第二时段之后,将上述的第二排序结果确定为第一排序结果,并返回步骤S201重新获取第二排序结果,进而对新获取的第二排序结果与上一次获取的曾经是第二排序结果的第一排序结果进行比较,从而来决定是否能够根据一致的采样比例对训练数据重采样。In step S205, the second time period is newly determined, and the second sorting result is determined as the first sorting result. Among them, the re-determined second period is different from the second period used in step S201, and the definition of the relationship between the re-determined second period and the second period used in step S201 is different from that in step S201 The relationship between the second period used and the first period is similar, both allow overlapping time between them, and there is no fixed sequence relationship. In addition, the re-determined second period cannot be the same as the first period and all previous It is determined that the second time period is the same, so that each time when the original data is obtained for ratio calculation and sorting, it can ensure that the selected content can truly reflect the proportion of different preset categories in actual application, and does not repeat . After re-determining the second time period, determine the second sorting result as the first sorting result, and return to step S201 to obtain the second sorting result again, and then compare the newly acquired second sorting result with the previous The first sorting result of the second sorting result is compared to determine whether the training data can be resampled according to a consistent sampling ratio.
综上,在本实施例中,先对第一时段的第一原始数据中各个预设分类所占的比例得到第一排序结果,然后对第二时段的第二原始数据中的各个预设分类所占的比例得到第二排序结果,并对第一排序结果和第二排序结果进行比较。如果一致,则可以表明该排序结果无误差,可以直接根据该排名确定各个预设分类的采样比例,进而根据该采样比例对训练数据重采样;如果不一致,则表明二者中有至少一者是有误差的,从而需要重新选取新的时段中的新的 原始数据,并重新根据新的原始数据得到一个新的排序结果,并与最近一次获取到的排序结果进行比较,直到比较结果一致为止。当该比较结果经过多次比较之后达到一致时,此时就根据该一致的排序结果来确定各个预设分类对应的采样比例,并根据该采样比例对训练数据进行重采样。To sum up, in this embodiment, first obtain the first sorting result for the proportion of each preset classification in the first raw data of the first period, and then classify each preset classification in the second raw data of the second period The proportion takes the second sorting result, and compares the first sorting result with the second sorting result. If they are consistent, it can indicate that the sorting result is error-free, and the sampling ratio of each preset category can be directly determined according to the ranking, and then the training data is resampled according to the sampling ratio; if they are inconsistent, it means that at least one of the two is If there is an error, it is necessary to re-select the new original data in a new time period, and then obtain a new sorting result based on the new raw data again, and compare it with the latest sorting result obtained until the comparison result is consistent. When the comparison result is consistent after multiple comparisons, the sampling ratio corresponding to each preset category is determined according to the consistent sorting result, and the training data is resampled according to the sampling ratio.
根据本公开的实施例,通过至少两次对不同时段的不同原始数据中的各个预设分类所占的比例排名进行比较,就能够确定出一个能够真正反映原始数据中不同预设分类所占比例的排名,避免了由于所选取的时段比较特殊或者偶发性事件导致的所确定的原始数据中各个预设分类所占比例不能很好的反映实际原始数据中各个预设分类所占比例的情况,进而保证了根据该排名确定的采样比例进行重采样时的准确性,从而使得根据重采样后的训练数据训练得到的分类模型的效果进一步提升。According to an embodiment of the present disclosure, by comparing the ranking of the proportion of each preset category in different original data in different periods at least twice, a ratio that can truly reflect the proportion of different preset categories in the original data can be determined Ranking, avoiding the situation that the proportion of each preset category in the determined original data due to the selected time period is special or infrequent events does not reflect the proportion of each preset category in the actual original data well, Furthermore, the accuracy of re-sampling according to the sampling ratio determined by the ranking is ensured, so that the effect of the classification model trained based on the re-sampled training data is further improved.
在一种可能的实施方式中,当至少两个预设分类的比例相同时,按照至少两个预设分类的优先级确定至少两个预设分类的排序。在图1中所示的步骤S103和图2所示的步骤S203中根据各个原始数据中多个预设分类分别所占的比例的大小关系按照预设规则进行排序时,如果出现两个预设分类所占的比例相同时,可以根据比例相同的预设分类的优先级来确定这些相同比例的预设分类的先后顺序。例如,令预设分类A、预设分类B、预设分类C在一个时段中的原始数据中所占的比例都为2%,预设分类A、预设分类B、预设分类C的优先级为预设分类A>预设分类B>预设分类C,那么在进行排序时,如果按照从大到小的预设规则,那么应该按照预设分类A排在三者之首,其次是预设分类B,最后是预设分类C,如果是按照从小到大的规则,则与之正好相反,预设分类C应该排在最前,其次是预设分类B,最后是预设分类A。In a possible implementation manner, when the proportions of the at least two preset categories are the same, the order of the at least two preset categories is determined according to the priorities of the at least two preset categories. In step S103 shown in FIG. 1 and step S203 shown in FIG. 2, when sorting according to the size relationship of the proportions of multiple preset categories in each original data according to preset rules, if two presets appear When the proportions of the categories are the same, the order of the preset categories with the same proportion may be determined according to the priority of the preset categories with the same proportion. For example, let preset category A, preset category B, and preset category C account for 2% of the original data in a period of time, and preset category A, preset category B, and preset category C have priority The level is preset category A> preset category B> preset category C, then when sorting, if according to the preset rule from large to small, then according to the preset category A should be ranked first, followed by Preset category B, and finally, preset category C. If the rules are from small to large, the opposite is true. Preset category C should be ranked first, followed by preset category B, and finally preset category A.
图3是根据本公开的示例性实施例的一种训练数据重采样装置的结构框图,如图3所示,该装置可以包括:第一获取模块10,用于获取第一时段内的第一原始数据;第一计算模块20,用于计算第一原始数据中多个预设分类分别所占的第一比例;第一排序模块30,用于根据第一比例的大小关系按照预设规则对多个预设分类进行排序,获得第一排序结果;比例获取模块40,用于根据各预设分类的第一排序结果和预设对应关系,确定各预设分类对应的采样比例,该预设对应关系为第一排序结果与采样比例之间的对应关系;重 采样模块50,用于根据多个预设分类分别对应的采样比例对用于建模的训练数据进行重采样。FIG. 3 is a structural block diagram of a training data resampling device according to an exemplary embodiment of the present disclosure. As shown in FIG. 3, the device may include: a first obtaining module 10, configured to obtain a first The original data; the first calculation module 20 is used to calculate the first proportions respectively occupied by a plurality of preset categories in the first raw data; the first sorting module 30 is used to compare the first proportion according to the preset rule Multiple preset categories are sorted to obtain a first sorting result; the ratio obtaining module 40 is used to determine a sampling ratio corresponding to each preset classification according to the first sorting result of each preset classification and a preset correspondence relationship, the preset The correspondence relationship is the correspondence relationship between the first sorting result and the sampling ratio; the re-sampling module 50 is used to re-sample the training data used for modeling according to the sampling ratios corresponding to multiple preset categories, respectively.
图4是根据本公开的示例性实施例的又一训练数据重采样装置的结构框图。与图3所示的训练数据重采样装置相比,图4所示的装置还可以包括:第二获取模块60,用于在第一排序模块30获得第一排序结果之后,获取第二时段内的第二原始数据;第二计算模块70,用于计算第二原始数据中多个预设分类分别所占的第二比例;第二排序模块80,用于根据第二比例的大小关系按照预设规则对多个预设分类进行排序,获得第二排序结果;排名比较模块90,用于在第一排序结果和第二排序结果一致时,触发比例获取模块40确定各预设分类对应的采样比例。FIG. 4 is a structural block diagram of yet another training data resampling device according to an exemplary embodiment of the present disclosure. Compared with the training data resampling device shown in FIG. 3, the device shown in FIG. 4 may further include: a second acquiring module 60, configured to acquire the second sorting period after the first sorting module 30 obtains the first sorting result The second raw data of the second; the second calculation module 70 is used to calculate the second proportion of the plurality of preset categories respectively in the second raw data; the second sorting module 80 is used to calculate Set rules to sort multiple preset classifications to obtain a second sorting result; ranking comparison module 90 is used to trigger the ratio obtaining module 40 to determine the sampling corresponding to each preset classification when the first sorting result and the second sorting result are consistent proportion.
在一种可能的实施方式中,排名比较模块90还可以被配置成:在第一排序结果和第二排序结果不一致时,重新确定第二时段,并将第二排序结果确定为第一排序结果;触发第二获取模块60获取第二时段内的第二原始数据。In a possible implementation manner, the ranking comparison module 90 may be further configured to: when the first ranking result and the second ranking result are inconsistent, re-determine the second time period, and determine the second ranking result as the first ranking result ; Trigger the second acquisition module 60 to acquire the second raw data in the second period.
在一种可能的实施方式中,当至少两个预设分类的比例相同时,按照至少两个预设分类的优先级确定至少两个预设分类的排序In a possible implementation manner, when the ratio of at least two preset categories is the same, the order of the at least two preset categories is determined according to the priorities of the at least two preset categories
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.
图5是根据本公开的示例性实施例的一种电子设备500的框图。例如,电子设备500可以被提供为服务器。参照图5,电子设备500包括处理器522,其数量可以为一个或多个,以及存储器532,用于存储可由处理器522执行的计算机程序。存储器532中存储的计算机程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理器522可以被配置为执行该计算机程序,以执行上述的训练数据重采样方法。FIG. 5 is a block diagram of an electronic device 500 according to an exemplary embodiment of the present disclosure. For example, the electronic device 500 may be provided as a server. Referring to FIG. 5, the electronic device 500 includes a processor 522, the number of which may be one or more, and a memory 532 for storing a computer program executable by the processor 522. The computer program stored in the memory 532 may include one or more modules each corresponding to a set of instructions. In addition, the processor 522 may be configured to execute the computer program to perform the training data resampling method described above.
另外,电子设备500还可以包括电源组件526和通信组件550,该电源组件526可以被配置为执行电子设备500的电源管理,该通信组件550可以被配置为实现电子设备500的通信,例如,有线或无线通信。此外,该电子设备500还可以包括输入/输出(I/O)接口558。电子设备500可以操作基于存储在存储器532的操作系统,例如Windows Server TM、Mac OS X TM、Unix TM、Linux TM等等。 In addition, the electronic device 500 may further include a power supply component 526 and a communication component 550, which may be configured to perform power management of the electronic device 500, and the communication component 550 may be configured to implement communication of the electronic device 500, for example, wired Or wireless communication. In addition, the electronic device 500 may also include an input / output (I / O) interface 558. The electronic device 500 can operate based on an operating system stored in the memory 532, such as Windows Server , Mac OS X , Unix , Linux ™, and so on.
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的训练数据重采样方法。例如,该计算机可读存储介质可以为上述包括程序指令的存储器532,上述程序指令可由电子设备500的处理器522执行以完成上述的训练数据重采样方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided. When the program instructions are executed by a processor, the training data resampling method described above is implemented. For example, the computer-readable storage medium may be the above-mentioned memory 532 including program instructions, and the above-mentioned program instructions may be executed by the processor 522 of the electronic device 500 to complete the above training data resampling method.
以上结合附图详细描述了本公开的示例性实施方式,但是,本公开并不限于上述实施方式中的具体细节,在本公开的技术构思范围内,可以对本公开的技术方案进行多种简单变型,这些简单变型均属于本公开的范围。The exemplary embodiments of the present disclosure have been described in detail above with reference to the drawings. However, the present disclosure is not limited to the specific details in the above embodiments, and within the scope of the technical idea of the present disclosure, various simple modifications can be made to the technical solutions of the present disclosure These simple modifications are within the scope of this disclosure.
另外需要说明的是,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合。为了避免不必要的重复,本公开对各种可能的组合方式不再另行说明。In addition, it should be noted that the specific technical features described in the above specific embodiments can be combined in any suitable manner without contradictions. In order to avoid unnecessary repetition, the present disclosure does not describe various possible combinations.
此外,本公开的各种不同的实施方式之间也可以进行任意组合,只要其不违背本公开的构思,就应被视为在本公开的范围内。In addition, any combination of various embodiments of the present disclosure may also be arbitrarily combined, as long as it does not violate the concept of the present disclosure, it should be regarded as within the scope of the present disclosure.

Claims (10)

  1. 一种训练数据重采样方法,所述方法包括:A training data resampling method, the method includes:
    获取第一时段内的第一原始数据;Obtain the first raw data in the first period;
    计算所述第一原始数据中多个预设分类分别所占的第一比例;Calculating a first proportion respectively occupied by multiple preset categories in the first original data;
    根据所述第一比例的大小关系按照预设规则对所述多个预设分类进行排序,获得第一排序结果;Sorting the plurality of preset categories according to a preset rule according to the size relationship of the first ratio, to obtain a first sorting result;
    根据各预设分类的所述第一排序结果和预设对应关系,确定各预设分类对应的采样比例,所述预设对应关系为所述第一排序结果与所述采样比例之间的对应关系;Determine a sampling ratio corresponding to each preset category according to the first sorting result and preset correspondence of each preset category, and the preset correspondence is the correspondence between the first sorting result and the sampling ratio relationship;
    根据所述多个预设分类分别对应的所述采杆比例,对用于建模的训练数据进行重采杆。Based on the proportions of the poles respectively corresponding to the plurality of preset categories, the training data used for modeling is re-mined.
  2. 根据权利要求1所述的方法,所述方法还包括:在获得所述第一排序结果之后,The method according to claim 1, further comprising: after obtaining the first sorting result,
    获取第二时段内的第二原始数据;Obtain the second raw data in the second period;
    计算所述第二原始数据中所述多个预设分类分别所占的第二比例;Calculating a second proportion respectively occupied by the plurality of preset categories in the second original data;
    根据所述第二比例的大小关系按照所述预设规则对所述多个预设分类进行排序,获得第二排序结果;Sorting the plurality of preset categories according to the size relationship of the second ratio according to the preset rule, to obtain a second sorting result;
    在所述第一排序结果和所述第二排序结果一致时,执行所述确定各预设分类对应的采样比例的步骤。When the first sorting result is consistent with the second sorting result, the step of determining the sampling ratio corresponding to each preset classification is performed.
  3. 根据权利要求2所述的方法,其中,The method according to claim 2, wherein
    在所述第一排序结果和所述第二排序结果不一致时,重新确定所述第二时段,并将所述第二排序结果确定为第一排序结果;When the first sorting result and the second sorting result are inconsistent, re-determine the second time period, and determine the second sorting result as the first sorting result;
    返回所述获取第二时段内的第二原始数据的步骤。Return to the step of obtaining the second original data in the second period.
  4. 根据权利要求1-3中任一项所述的方法,其中,当至少两个预设分类的比例相同时,按照所述至少两个预设分类的优先级确定所述至少两个预设 分类的排序。The method according to any one of claims 1 to 3, wherein when the proportion of at least two preset categories is the same, the at least two preset categories are determined according to the priority of the at least two preset categories Sort.
  5. 一种训练数据重采杆装置,所述装置包括:A training data re-collecting rod device, the device includes:
    第一获取模块,用于获取第一时段内的第一原始数据;The first obtaining module is used to obtain the first original data in the first period;
    第一计算模块,用于计算所述第一原始数据中多个预设分类分别所占的第一比例;A first calculation module, configured to calculate a first proportion respectively occupied by multiple preset categories in the first original data;
    第一排序模块,用于根据所述第一比例的大小关系按照预设规则对所述多个预设分类进行排序,获得第一排序结果;A first sorting module, configured to sort the plurality of preset categories according to a preset rule according to the size relationship of the first ratio, to obtain a first sorting result;
    比例获取模块,用于根据各预设分类的所述第一排序结果和预设对应关系,确定各预设分类对应的采样比例,所述预设对应关系为所述第一排序结果与所述采杆比例之间的对应关系;A ratio obtaining module, configured to determine a sampling ratio corresponding to each preset category according to the first sorting result of each preset category and a preset correspondence, the preset correspondence between the first sorting result and the Correspondence between the ratio of rod mining;
    重采样模块,用于根据所述多个预设分类分别对应的所述采杆比例对用于建模的训练数据进行重采样。The re-sampling module is used for re-sampling the training data used for modeling according to the rod proportion corresponding to the plurality of preset categories respectively.
  6. 根据权利要求5所述的装置,所述装置还包括:The device according to claim 5, further comprising:
    第二获取模块,用于在所述第一排序模块获得所述第一排序结果之后,获取第二时段内的第二原始数据;A second obtaining module, configured to obtain second original data within a second period after the first sorting module obtains the first sorting result;
    第二计算模块,用于计算所述第二原始数据中所述多个预设分类分别所占的第二比例;A second calculation module, configured to calculate a second proportion respectively occupied by the plurality of preset categories in the second original data;
    第二排序模块,用于根据所述第二比例的大小关系按照所述预设规则对所述多个预设分类进行排序,获得第二排序结果;A second sorting module, configured to sort the plurality of preset classifications according to the preset ratio according to the size relationship of the second ratio, to obtain a second sorting result;
    排名比较模块,用于在所述第一排序结果和所述第二排序结果一致时,触发所述比例获取模块确定各预设分类对应的所述采样比例。The ranking comparison module is configured to trigger the ratio acquisition module to determine the sampling ratio corresponding to each preset category when the first ranking result is consistent with the second ranking result.
  7. 根据权利要求6所述的装置,其中,所述排名比较模块还被配置成:The apparatus according to claim 6, wherein the ranking comparison module is further configured to:
    在所述第一排序结果和所述第二排序结果不一致时,重新确定所述第二时段,并将所述第二排序结果确定为第一排序结果;When the first sorting result and the second sorting result are inconsistent, re-determine the second time period, and determine the second sorting result as the first sorting result;
    触发所述第二获取模块获取第二时段内的第二原始数据。The second acquisition module is triggered to acquire the second original data in the second period.
  8. 根据权利要求5-7中任一项所述的装置,其中,当至少两个预设分类的比例相同时,按照所述至少两个预设分类的优先级确定所述至少两个预设分类的排序。The apparatus according to any one of claims 5-7, wherein, when the ratio of at least two preset categories is the same, the at least two preset categories are determined according to the priorities of the at least two preset categories Sort.
  9. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现权利要求1-4中任一项所述的方法。A computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method of any one of claims 1-4.
  10. 一种电子设备,包括:An electronic device, including:
    存储器,其上存储有计算机程序;Memory, on which computer programs are stored;
    处理器,用于执行所述存储器中的所述计算机程序,以实现权利要求1-4中任一项所述的方法。A processor, configured to execute the computer program in the memory, to implement the method of any one of claims 1-4.
PCT/CN2019/094741 2018-11-08 2019-07-04 Training data re-sampling method and apparatus, and storage medium and electronic device WO2020093718A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811327417.3A CN109635034B (en) 2018-11-08 2018-11-08 Training data resampling method and device, storage medium and electronic equipment
CN201811327417.3 2018-11-08

Publications (1)

Publication Number Publication Date
WO2020093718A1 true WO2020093718A1 (en) 2020-05-14

Family

ID=66067584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/094741 WO2020093718A1 (en) 2018-11-08 2019-07-04 Training data re-sampling method and apparatus, and storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN109635034B (en)
WO (1) WO2020093718A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635034B (en) * 2018-11-08 2020-03-03 北京字节跳动网络技术有限公司 Training data resampling method and device, storage medium and electronic equipment
CN111582315B (en) * 2020-04-09 2023-11-14 上海淇毓信息科技有限公司 Sample data processing method and device and electronic equipment
TWI756967B (en) * 2020-12-04 2022-03-01 萬里雲互聯網路有限公司 Device and method for predicting content clicks

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097103A1 (en) * 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
CN107545275A (en) * 2017-07-27 2018-01-05 华南理工大学 The unbalanced data Ensemble classifier method that resampling is merged with cost sensitive learning
CN107563435A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Higher-dimension unbalanced data sorting technique based on SVM
CN108717547A (en) * 2018-03-30 2018-10-30 国信优易数据有限公司 The method and device of sample data generation method and device, training pattern
CN109635034A (en) * 2018-11-08 2019-04-16 北京字节跳动网络技术有限公司 Training data method for resampling, device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649205B (en) * 2016-10-18 2019-08-23 一诺仪器(中国)有限公司 The resampling device of statistical information
JP2018072968A (en) * 2016-10-26 2018-05-10 セイコーエプソン株式会社 Data processor and data processing method
CN107169518A (en) * 2017-05-18 2017-09-15 北京京东金融科技控股有限公司 Data classification method, device, electronic installation and computer-readable medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097103A1 (en) * 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
CN107545275A (en) * 2017-07-27 2018-01-05 华南理工大学 The unbalanced data Ensemble classifier method that resampling is merged with cost sensitive learning
CN107563435A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Higher-dimension unbalanced data sorting technique based on SVM
CN108717547A (en) * 2018-03-30 2018-10-30 国信优易数据有限公司 The method and device of sample data generation method and device, training pattern
CN109635034A (en) * 2018-11-08 2019-04-16 北京字节跳动网络技术有限公司 Training data method for resampling, device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN109635034A (en) 2019-04-16
CN109635034B (en) 2020-03-03

Similar Documents

Publication Publication Date Title
WO2020093718A1 (en) Training data re-sampling method and apparatus, and storage medium and electronic device
US11586464B2 (en) Techniques for workflow analysis and design task optimization
CN109342989B (en) Error analysis method and device for electric energy meter
WO2021208079A1 (en) Method and apparatus for obtaining power battery life data, computer device, and medium
CN108833458B (en) Application recommendation method, device, medium and equipment
CN105989001A (en) Image searching method and device, and image searching system
CN109271380A (en) A kind of tables of data mass data method of calibration and terminal device
CN106648839A (en) Method and device for processing data
Colby et al. Counterfactual Exploration for Improving Multiagent Learning.
CN109558232A (en) Determination method, apparatus, equipment and the medium of degree of parallelism
CN110334104A (en) A kind of list update method, device, electronic equipment and storage medium
CN109408041A (en) A kind of the map coordinates system conversion method and electronic equipment of language based on programming
CN116862025A (en) Model training method, system, client and server node, electronic device and storage medium
CN115186738B (en) Model training method, device and storage medium
CN112070487B (en) AI-based RPA flow generation method, apparatus, device and medium
CN111652384B (en) Balancing method for data volume distribution and data processing method
CN108990072B (en) Multi-operator site planning method and device
CN112990332B (en) Sub-graph scale prediction and distributed training method and device and electronic equipment
CN109669993A (en) Data processing method, data processing equipment and electronic equipment
CN110413519A (en) Template processing method and processing device
CN117271098B (en) AI model calculation core scheduling method, device, equipment and storage medium
CN111950225B (en) Chip layout method and device, storage medium and electronic equipment
CN110737679B (en) Data resource query method, device, equipment and storage medium
US11741058B2 (en) Systems and methods for architecture embeddings for efficient dynamic synthetic data generation
CN104363470A (en) Stream service testing method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19881376

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 29/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19881376

Country of ref document: EP

Kind code of ref document: A1