WO2017167095A1 - 一种模型的训练方法和装置 - Google Patents

一种模型的训练方法和装置 Download PDF

Info

Publication number
WO2017167095A1
WO2017167095A1 PCT/CN2017/077696 CN2017077696W WO2017167095A1 WO 2017167095 A1 WO2017167095 A1 WO 2017167095A1 CN 2017077696 W CN2017077696 W CN 2017077696W WO 2017167095 A1 WO2017167095 A1 WO 2017167095A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
feature component
sample data
subset
model
Prior art date
Application number
PCT/CN2017/077696
Other languages
English (en)
French (fr)
Inventor
丁轶
余晋
熊怀东
陈绪
Original Assignee
阿里巴巴集团控股有限公司
丁轶
余晋
熊怀东
陈绪
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 丁轶, 余晋, 熊怀东, 陈绪 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017167095A1 publication Critical patent/WO2017167095A1/zh
Priority to US16/146,642 priority Critical patent/US11580441B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the present application relates to the technical field of computer processing, and in particular to a training method for a model and a training device for a model.
  • distributed machine learning provides a method for machine learning and training models through large-scale computer clusters, which is usually built on a computer cluster composed of a large number of computers, cluster scheduling, resource management, and distributed operating systems. Task control.
  • model parameters are updated due to the calculation of the training algorithm and are constantly changing.
  • the training algorithm often needs multiple vectors of different lengths to participate in the calculation. Since the number of model parameters in the training process is usually hundreds of millions or even tens of billions of floating point numbers, these model parameters need to be used.
  • the storage resources of the computer cluster are stored.
  • sample data often directly affects the effect of machine learning algorithms, without a large number of samples. This data does not meet the required model training effects, and in order to get a reasonable model, it may take up to tens of billions of sample data.
  • the training work cannot guarantee high speed and high efficiency due to the slow access speed and time delay of the storage medium.
  • embodiments of the present application have been made in order to provide a training method of a model and a corresponding model training apparatus that overcomes the above problems or at least partially solves the above problems.
  • a training method for a model including:
  • the model is trained based on sample data having the portion of the second feature component.
  • the step of reading part of the sample data in the full set of samples and combining the sample subsets comprises:
  • the step of mapping the model parameters related to the partial sample data from the first feature component for the sample corpus to the second feature component for the sample subset comprises:
  • the model parameters related to the partial sample data are mapped from the first feature component for the sample corpus to the second feature component for the sample subset according to the mapping vector relationship.
  • the step of training the model according to the sample data having the partial second feature component comprises:
  • the step of reading the sample data in the subset of samples comprises:
  • the step of mapping the model parameters related to the partial sample data from the second feature component for the sample subset to the first feature component for the sample corpus includes:
  • the model parameters related to the partial sample data are mapped from the second feature component for the subset of samples to the first feature component for the full set of samples according to the mapping relationship vector.
  • the step of sending the training result corresponding to the first feature component to a vector computer comprises:
  • the character sequence including an update identifier for the first feature component and a forbidden update identifier for other feature components, the other feature component being a feature component of the sample ensemble except the first feature component ;
  • the sequence of characters and the training result are sent to a vector computer.
  • the embodiment of the present application further discloses a training device for a model, including:
  • a sample subset reading module configured to read part of the sample data in the sample set, and combine into a sample subset
  • a feature component mapping module configured to map a model parameter related to the partial sample data from a first feature component for the sample corpus to a second feature component for the sample subset;
  • a model training module for training a model based on sample data having the portion of the second feature component.
  • the sample subset reading module comprises:
  • the first part of the sample data reading submodule is configured to read part of the sample data in the sample collection from the file storage system
  • a portion of the sample data is written to the sub-module for writing the partial sample data to a designated area in the file storage system to be combined into a subset of samples.
  • the feature component mapping module comprises:
  • mapping relationship vector establishing submodule configured to establish, for a model parameter related to the partial sample data, a mapping relationship vector between a first feature component for the sample corpus and a second feature component for the sample subset;
  • a sample subset mapping submodule configured to map a model parameter related to the partial sample data from a first feature component for the sample corpus to a second feature component for the sample subset according to the mapping vector relationship .
  • the model training module comprises:
  • a training sub-module configured to perform training by using the part of sample data to obtain a training result
  • a sample corpus mapping sub-module configured to map model parameters related to the partial sample data from a second feature component for the sample subset to a first feature component for the sample full set;
  • a communication submodule configured to send the training result corresponding to the first feature component to a vector computer to further replace the model parameter corresponding to the first feature component in the model.
  • the second partial sample data reading submodule comprises:
  • a first reading unit configured to read sample data of a subset of samples stored in a current sample computer
  • the second reading unit is configured to read the sample data of the sample subset stored by the other sample computer before receiving the sample transfer message of the other sample computer.
  • the sample corpus mapping sub-module comprises:
  • mapping relationship vector reading unit configured to read a preset mapping relationship vector
  • mapping relationship mapping unit configured to map the model parameters related to the partial sample data from the second feature component for the subset of samples to the first feature component for the full set of samples according to the mapping relationship vector.
  • the communication submodule comprises:
  • a character sequence adding unit for adding a character sequence, the character sequence including An update identifier of the first feature component and a forbidden update identifier for other feature components, the other feature component being a feature component of the sample full set except the first feature component;
  • a sending unit configured to send the sequence of characters and the training result to a vector computer.
  • the embodiment of the present application utilizes the locality of the sample data carried by the single sample computer, reads part of the sample data in the sample set, combines into a sample subset, and compares the model parameters related to the partial sample data to the first feature component of the sample complete set. Mapping to a second feature component for the subset of samples and training the model based on sample data having a portion of the second feature component:
  • the copy size of the model parameters on the sample computer can be reduced, the amount of training data is greatly reduced, the computer memory footprint is minimized, and the sample computer memory is used to place vectors and load samples, thereby minimizing efficiency loss.
  • mapping has no effect on the computational performance of the model training process, and is transparent to the training algorithm.
  • the original training algorithm can be used without modification.
  • the flexible processing of the sample data in the embodiment of the present application can effectively distribute the load of the sample data to different sample computers in parallel, avoiding the efficiency drop caused by the “long tail”, and easily increase the model size or increase the sample by adding hardware devices.
  • the amount of data can effectively distribute the load of the sample data to different sample computers in parallel, avoiding the efficiency drop caused by the “long tail”, and easily increase the model size or increase the sample by adding hardware devices. The amount of data.
  • the embodiment of the present application communicates by character sequence, and the number of bytes used is smaller than the number of bytes directly transferring floating point data, which reduces the consumption of cluster communication resources.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a training method of a model of the present application
  • FIG. 2 is a block diagram showing the structure of an embodiment of a training apparatus of the present application.
  • FIG. 1 a flow chart of steps of a method for training a model of the present application is shown, which may specifically include the following steps:
  • Step 101 Read part of the sample data in the sample set to be combined into a sample subset
  • the original sample data can be collected by means of a website log or the like.
  • the original sample data is user behavior information, used to train the classification model, recommend relevant information
  • the general website log can record the IP (Internet Protocol) protocol of the user's computer, At what time, what operating system, what browser, what display, which page of the website was accessed, and whether the access was successful.
  • IP Internet Protocol
  • the demand for user behavior information is usually not the user's computer's IP address, operating system, browser and other robot data, but what information the user browses, the behavior of his or her preference, etc., which can characterize the user's interests. Behavioral information.
  • sample data is only used as an example. In the implementation of the embodiment of the present application, other sample data may be set according to actual conditions, which is not limited by the embodiment of the present application.
  • the original sample data can be preprocessed, such as removing dirty words and high frequency words, removing robot data, removing noise (such as accessing little information or random behavior), etc. Become standardized sample data.
  • Embodiments of the present application can be applied to a computer cluster, such as a distributed system, including a file storage system, one or more sample computers, and one or more vector computers.
  • a computer cluster such as a distributed system, including a file storage system, one or more sample computers, and one or more vector computers.
  • the file storage system can store a complete set of samples, that is, a set of all sample data.
  • An example of the file storage system is a distributed file system, for example, NFS (Network File System), Coda, AFS (Andrew File System), Sprite. File System, HDFS (Hadoop Distributed File System), Pangu system, etc., all sample computers are readable.
  • the sample computer can read part of the sample data from the file storage system to train the model, and record the model parameters related to the sample data read by it, and the part of the sample data can be called a sample subset, and the sample subset
  • the sample data is modeled according to the training algorithm of the model.
  • the vector computer is used to store the vector of the model, perform vector calculations and vector output.
  • a so-called model can usually be represented as a vector or a set of vectors, each dimension in the vector, called a model parameter.
  • Sample data usually expressed as weights for one or a set of model parameters.
  • a 10-dimensional linear model is represented, which has a total of 10 model parameters, such as 0.1, 0.2, and so on.
  • the value of the model parameters is generally obtained by training the sample data.
  • a user can generate a sample of data on the Internet through a single click of a browser page, and each sample data may contain model parameters involved in the sample data.
  • the sample data represents a weight of 0.01 for the model parameter with ID 1 and a weight of 0.02 for the model parameter with ID 3 and 0.03 for the model parameter with ID 8, ie this sample
  • the data affects the three model parameters in the model.
  • step 101 may include the following sub-steps:
  • Sub-step S11 reading part of the sample data in the sample collection from the file storage system
  • Sub-step S12 the partial sample data is written into a designated area in the file storage system to be combined into a sample subset.
  • the sample corpus may be stored in a file storage system, all sample computer readable areas, and the sample computer may read part of the samples from the sample ensemble in a random manner, and write them back to the file storage system.
  • the sample computer readable area quickly realizes the segmentation and distribution of the sample corpus.
  • the reading manner of the sample data is only an example.
  • the reading manner of other sample data may be set according to actual conditions, for example, the sample complete set is divided and then read by the sample computer, etc.
  • the embodiment of the present application does not limit this.
  • Step 102 Map a model parameter related to the partial sample data from a first feature component for the sample corpus to a second feature component for the sample subset;
  • the main network communication work of these sample computers is to upload and download frequently. Sample the model parameters on the computer and store a copy of these model parameters locally on your own.
  • the individual components of the model parameters are usually independent of each other from a mathematical point of view, and the calculations on the sample computer usually focus only on the components it needs.
  • model parameters are generally independent of each other. For a certain sample subset, it is only possible to use some of the model parameters. Therefore, the model parameters used by it can be numbered separately.
  • a mapping relationship vector between the first feature component for the sample corpus and the second feature component for the sample subset may be established for the model parameters related to the partial sample data.
  • the length of the mapping relationship vector is the number of model parameters involved in this sample subset.
  • mapping relationship vector is usually saved as a hash table or a sorted linear table, which is stored in the file storage system.
  • the model parameters related to the partial sample data may be mapped from the first feature component for the sample corpus to the second feature component for the sample subset according to the mapping vector relationship.
  • the first feature component of the sample ensemble is 1, 3, 6, or 8.
  • the four model parameters are related, and in the mapping, the four model parameters can be represented in order using the second feature components 1, 2, 3, 4.
  • mapping relationship vector [1:1, 2:3, 3:6:4:8].
  • model parameters contained in the sample subset are 4 instead of 10:
  • the first feature component of the same model parameter may correspond to a different second feature component due to the different mapping relationship vectors.
  • the first feature component for the sample corpus is 100
  • the second feature component obtained in the mapping relationship vector is 50.
  • the first feature component 100 in the sample subset can be converted into A new second feature component 50.
  • the total number of related model parameters is much smaller than the dimension of the model, so the data amount of the second feature component for the sample subset is much smaller than the first feature for the sample corpus.
  • the component, mapping operation can greatly compress the amount of data of the sample data.
  • Step 103 Train the model according to the sample data having the partial second feature component.
  • the sample computer can read the sample data and mapping vector from the file storage system and save it to its own memory.
  • Model training can be performed for each triple that is stored in memory.
  • the vector of the model is saved on the vector computer, and the sample computer calculates the gradient and loss values (ie, the optimization objective function) for the sample data that it is responsible for, and calculates The result is pushed onto the vector computer and the latest gradient values are taken from the vector computer for the next iteration.
  • the gradient and loss values ie, the optimization objective function
  • the embodiment of the present application utilizes the locality of the sample data carried by the single sample computer, reads part of the sample data in the sample set, combines into a sample subset, and compares the model parameters related to the partial sample data to the first feature component of the sample complete set. Mapping to a second feature component for the subset of samples and training the model based on sample data having a portion of the second feature component:
  • the copy size of the model parameters on the sample computer can be reduced, the amount of training data is greatly reduced, the computer memory footprint is minimized, and the sample computer memory is used to place vectors and load samples, thereby minimizing efficiency loss.
  • mapping has no effect on the computational performance of the model training process, and is transparent to the training algorithm.
  • the original training algorithm can be used without modification.
  • step 103 may include the following sub-steps:
  • the current sample computer can read sample data from a subset of samples previously stored by the current sample computer.
  • each sample computer may read multiple subsets of samples.
  • the sample subset can be dynamically transferred.
  • the current sample computer can read other methods by acquiring the read permission, transferring the storage area of the sample subset, and the like.
  • the sample data of the sample subset stored in the sample computer first.
  • a local copy of the model parameters typically contains only those components that the sample computer really needs, without requiring all of the components, which can result in significant savings in local space and enable the triples to load sample data when actually used.
  • the purpose of high-efficiency large-scale model training is achieved.
  • each sample computer is only responsible for the subset of samples that it reads, if one sample computer has more sample subsets remaining, and the other sample computers are in an idle state, the training efficiency is degraded due to the "long tail" of the data. .
  • the flexible processing of the sample data in the embodiment of the present application can effectively distribute the load of the sample data to different sample computers in parallel, avoiding the efficiency drop caused by the “long tail”, and easily increase the model size or increase the sample by adding hardware devices.
  • the amount of data can effectively distribute the load of the sample data to different sample computers in parallel, avoiding the efficiency drop caused by the “long tail”, and easily increase the model size or increase the sample by adding hardware devices. The amount of data.
  • Sub-step S22 performing training by using the partial sample data to obtain a training result
  • the training methods are generally different, and the training results obtained are generally different.
  • the training result is the weight of the polynomial.
  • the training result is a decision tree.
  • each triplet contains only the subscripts involved in this triplet, the amount of data is greatly reduced, which greatly saves memory usage and improves training speed in the sparse model training process.
  • Sub-step S23 mapping the partial model data related model parameters from the second feature component for the sample subset to the first feature component for the sample corpus;
  • the sample computer pushes the training results to the vector computer.
  • mapping relationship vector in the triple can be used to convert the subscript of the training result into a subscript on the vector computer.
  • This process is performed by the sample computer itself before communication.
  • the training algorithm does not perceive the process. That is, the mapping conversion of the subscript is transparent to the training algorithm, and is independent of the training algorithm.
  • the training algorithm itself is responsible for the calculation.
  • the preset mapping relationship vector may be read from a location such as a file storage system, and the model parameters related to the partial sample data are mapped from the second feature component for the sample subset to the full sample set according to the mapping relationship vector.
  • the first feature component may be read from a location such as a file storage system, and the model parameters related to the partial sample data are mapped from the second feature component for the sample subset to the full sample set according to the mapping relationship vector.
  • the sample data in the sample subset is (1:0.01, 2:0.02, 4:0.03), assuming that the training weight is [1:0.05, 2:0.06, 3: 0,4:0.07] (The reason why the third value is 0 is that this sample data does not affect the model parameters with ID 3).
  • mapping relationship vector [1:1, 2:3, 3:6:4:8]
  • the sample data can be mapped and combined with the gradient values to obtain the vector [1:0.05, 3:0.06,6 :0,8:0.07], all model parameters have been restored to the first feature component for the full set of samples, rather than the second feature component for the subset of samples.
  • Sub-step S24 the training result corresponding to the first feature component is sent to a vector computer to further model parameters corresponding to the first feature component in the model.
  • the bits of the character sequence include an update identifier for the first feature component and a forbidden update identifier for other feature components, and the other feature components are feature components of the sample ensemble except the first feature component.
  • a communication needs to update the model parameters of the first feature component from 1000 to 5000 on the vector computer, applying the embodiment of the present application, one (5000-1000)/8, that is, a sequence of characters of about 500 bytes can be simultaneously transmitted.
  • Each of the bits indicates whether the current communication is updating the corresponding model parameters (eg, 0 means no update, 1 means update), and the corresponding new model parameters can be transmitted sequentially, avoiding a large number of 0 values being transmitted on the network.
  • the vector to be pushed in the above example is [1:0.05, 3:0.06, 6:0, 8:0.07], and the sequence of characters added in the communication is [1010010100], indicating that the first feature component is 1, 3, 6
  • the model parameters of 8, need to be updated, and the real data is represented by three values [0.05, 0.06, 0.07].
  • the gradient vector of the model is updated. After receiving the results of all triples, the gradient value of the current round is determined, and then the model vector is updated and returned to each The sample computer continues the next iteration of training.
  • the vector computer After the iterative model training ends, the vector computer writes its saved model vector to the externally stored file storage system or database table in a "key-value” equivalent form, and outputs the trained model.
  • the embodiment of the present application communicates by character sequence, and the number of bytes used is smaller than the number of bytes directly transferring floating point data, which reduces the consumption of cluster communication resources.
  • FIG. 2 a structural block diagram of an embodiment of a training apparatus of a model of the present application is shown, which may specifically include the following modules:
  • the sample subset reading module 201 is configured to read part of the sample data in the sample set and combine into a sample subset;
  • the feature component mapping module 202 is configured to map the model parameters related to the partial sample data from a first feature component for the sample corpus to a second feature component for the sample subset;
  • the model training module 203 is configured to train the model according to the sample data having the partial second feature component.
  • the sample subset reading module 201 may include the following sub-modules:
  • the first part of the sample data reading submodule is configured to read part of the sample data in the sample collection from the file storage system
  • a portion of the sample data is written to the sub-module for writing the partial sample data to a designated area in the file storage system to be combined into a subset of samples.
  • the feature component mapping module 202 may include the following sub-modules:
  • mapping relationship vector establishing submodule configured to establish, for a model parameter related to the partial sample data, a mapping relationship vector between a first feature component for the sample corpus and a second feature component for the sample subset;
  • a sample subset mapping sub-module configured to map the model parameters related to the partial sample data from the first feature component for the sample corpus according to the mapping vector relationship to A second feature component of the sample subset.
  • the model training module 203 may include the following sub-modules:
  • a training sub-module configured to perform training by using the part of sample data to obtain a training result
  • a sample corpus mapping sub-module configured to map model parameters related to the partial sample data from a second feature component for the sample subset to a first feature component for the sample full set;
  • a communication submodule configured to send the training result corresponding to the first feature component to a vector computer to further replace the model parameter corresponding to the first feature component in the model.
  • the second partial sample data reading submodule may include the following units:
  • a first reading unit configured to read sample data of a subset of samples stored in a current sample computer
  • the second reading unit is configured to read the sample data of the sample subset stored by the other sample computer before receiving the sample transfer message of the other sample computer.
  • sample corpus mapping sub-module may include the following units:
  • mapping relationship vector reading unit configured to read a preset mapping relationship vector
  • mapping relationship mapping unit configured to map the model parameters related to the partial sample data from the second feature component for the subset of samples to the first feature component for the full set of samples according to the mapping relationship vector.
  • the communication submodule may include the following units:
  • a character sequence adding unit configured to add a character sequence, the character sequence including an update identifier for the first feature component and a forbidden update identifier for other feature components, wherein the other feature component is the sample ensemble except the a feature component of the first feature component;
  • a sending unit configured to send the sequence of characters and the training result to a vector computer.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is a computer readable medium Example.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Generated for implementing a process or multiple processes and/or parties in a flowchart
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

一种模型的训练方法和装置,该方法包括:读取样本全集中的部分样本数据,组合成样本子集(101);将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量(102);根据具有所述部分第二特征分量的样本数据训练模型(103)。映射之后可以减少模型参数在样本计算机上的副本大小,大大减少了训练的数据量,尽可能减少了计算机内存占用,使用样本计算机内存放置向量和装载样本,从而在尽可能少的效率损失的前提下,以相对低的资源开销进行机器学习、训练大规模的模型;映射对模型训练过程的计算性能没有影响,对训练算法透明,原有的训练算法无需修改直接可以使用。

Description

一种模型的训练方法和装置 技术领域
本申请涉及计算机处理的技术领域,特别是涉及一种模型的训练方法和一种模型的训练装置。
背景技术
随着互联网的快速发展,人们生活的方方面面都与互联网产生了联系,在人们使用互联网的相关功能时,产生了海量的数据。
目前,经常使用机器学习中的模型训练对这些海量的数据进行挖掘处理,从而进行分类、推荐等操作。
在模型学习中,由于参与训练的样本数据量巨大,使得模型巨大,动辄数亿甚至数十亿的浮点数组成模型,加大了训练的存储难度和计算时间,造成训练困难。
目前,分布式机器学习提供了通过大规模计算机集群进行机器学习、训练模型的方法,其通常构建在由数量庞大的计算机组成的计算机集群之上,通过分布式操作系统进行集群调度、资源管理和任务控制。
通常情况下,大规模的机器学习需要面对两个重要的参数:
1、模型参数。
在机器学习的过程中,模型参数会由于训练算法的计算而更新,不断发生变化。同时,为了得到最后的模型参数结果,训练算法往往需要多个长度不一的向量参与计算,由于训练过程中的模型参数的数量通常上亿甚至上百亿个浮点数,这些模型参数都需要使用计算机集群的存储资源进行存储。
2、样本数据。
样本数据的多少往往直接影响到机器学习算法的效果,没有大量的样 本数据达不到需要的模型训练效果,为了得到合理的模型,可能需要多达数百亿个样本数据。
由于机器学习的训练过程需要经历次数繁多的迭代过程,所有参与模型训练的样本数据都会反复的被使用,为最小化训练得到模型的时间,一般将样本数据存储到计算机内存中,这样就需要庞大的计算机内存。
然而,存储资源在计算机集群中是不可能无限增长的,如果将这些样本数据都放入内部存储,访问效率可以得到保证,但是,单独一台计算机的内存无疑是有限的,在面对海量的样本数据时,往往需要大大数量的计算机主机,这又带来网络、集群管理等多方面的问题。
如果将样本数据存放在外部存储器中,由于存储介质较慢的访问速度和时间延迟,训练工作无法保证高速、高效率地行。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种模型的训练方法和相应的一种模型的训练装置。
为了解决上述问题,本申请公开了一种模型的训练方法,包括:
读取样本全集中的部分样本数据,组合成样本子集;
将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
根据具有所述部分第二特征分量的样本数据训练模型。
优选地,所述读取样本全集中的部分样本数据,组合成样本子集的步骤包括:
从文件存储系统中读取样本全集中的部分样本数据;
将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成 样本子集。
优选地,所述将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量的步骤包括:
对所述部分样本数据相关的模型参数,建立针对所述样本全集的第一特征分量与针对所述样本子集的第二特征分量之间的映射关系向量;
将所述部分样本数据相关的模型参数,按照所述映射向量关系从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量。
优选地,所述根据具有所述部分第二特征分量的样本数据训练模型的步骤包括:
在每一轮迭代中,读取所述部分样本数据;
采用所述部分样本数据进行训练,获得训练结果;
将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
将所述第一特征分量对应的训练结果发送至向量计算机,以更模型中所述第一特征分量对应的模型参数。
优选地,所述读取所述样本子集中的样本数据的步骤包括:
读取当前样本计算机在先存储的样本子集中的样本数据;
或者,
当接收到其他样本计算机的样本转移消息时,读取其他样本计算机在先存储的样本子集中的样本数据。
优选地,所述将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量的步骤包括:
读取预设的映射关系向量;
将所述部分样本数据相关的模型参数,按照所述映射关系向量从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量。
优选地,所述将所述第一特征分量对应的训练结果发送至向量计算机的步骤包括:
添加字符序列,所述字符序列包括针对所述第一特征分量的更新标识和针对其他特征分量的禁止更新标识,所述其他特征分量为所述样本全集中除所述第一特征分量的特征分量;
将所述字符序列和所述训练结果发送至向量计算机。
本申请实施例还公开了一种模型的训练装置,包括:
样本子集读取模块,用于读取样本全集中的部分样本数据,组合成样本子集;
特征分量映射模块,用于将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
模型训练模块,用于根据具有所述部分第二特征分量的样本数据训练模型。
优选地,所述样本子集读取模块包括:
第一部分样本数据读取子模块,用于从文件存储系统中读取样本全集中的部分样本数据;
部分样本数据写入子模块,用于将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成样本子集。
优选地,所述特征分量映射模块包括:
映射关系向量建立子模块,用于对所述部分样本数据相关的模型参数,建立针对所述样本全集的第一特征分量与针对所述样本子集的第二特征分量之间的映射关系向量;
样本子集映射子模块,用于将所述部分样本数据相关的模型参数,按照所述映射向量关系从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量。
优选地,所述模型训练模块包括:
第二部分样本数据读取子模块,用于在每一轮迭代中,读取所述部分样本数据;
训练子模块,用于采用所述部分样本数据进行训练,获得训练结果;
样本全集映射子模块,用于将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
通信子模块,用于将所述第一特征分量对应的训练结果发送至向量计算机,以更模型中所述第一特征分量对应的模型参数。
优选地,所述第二部分样本数据读取子模块包括:
第一读取单元,用于读取当前样本计算机在先存储的样本子集中的样本数据;
或者,
第二读取单元,用于在接收到其他样本计算机的样本转移消息时,读取其他样本计算机在先存储的样本子集中的样本数据。
优选地,所述样本全集映射子模块包括:
映射关系向量读取单元,用于读取预设的映射关系向量;
映射关系映射单元,用于将所述部分样本数据相关的模型参数,按照所述映射关系向量从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量。
优选地,所述通信子模块包括:
字符序列添加单元,用于添加字符序列,所述字符序列包括针对所述 第一特征分量的更新标识和针对其他特征分量的禁止更新标识,所述其他特征分量为所述样本全集中除所述第一特征分量的特征分量;
发送单元,用于将所述字符序列和所述训练结果发送至向量计算机。
本申请实施例包括以下优点:
本申请实施例利用单一样本计算机所承载的样本数据的局部性,读取样本全集中的部分样本数据,组合成样本子集,将部分样本数据相关的模型参数,针对样本全集的第一特征分量,映射为针对样本子集的第二特征分量,并根据具有部分第二特征分量的样本数据训练模型:
首先,映射之后可以减少模型参数在样本计算机上的副本大小,大大减少了训练的数据量,尽可能减少了计算机内存占用,使用样本计算机内存放置向量和装载样本,从而在尽可能少的效率损失的前提下,以相对低的资源开销进行机器学习、训练大规模的模型;
其次,映射对模型训练过程的计算性能没有影响,对训练算法透明,原有的训练算法无需修改直接可以使用。
本申请实施例灵活地处理样本数据,可以有效的将样本数据的负载分布到不同的样本计算机上并行,规避“长尾”带来的效率下降,容易通过增加硬件设备来提高模型规模或增加样本数据的数量。
本申请实施例通过字符序列进行通信,所使用的字节数小于直接传递浮点数据的字节数,降低了对集群通信资源的消耗。
附图说明
图1是本申请的一种模型的训练方法实施例的步骤流程图;
图2是本申请的一种模型的训练装置实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
参照图1,示出了本申请的一种模型的训练方法实施例的步骤流程图,具体可以包括如下步骤:
步骤101,读取样本全集中的部分样本数据,组合成样本子集;
在具体实现中,可以通过网站日志等方式收集原始的样本数据。
例如,假设原始的样本数据为用户行为信息,用于训练分类模型、推荐相关的信息,而一般的网站日志可以记录用户电脑的IP(Internet Protocol,网络之间互连的协议)地址是什么、在什么时间、用什么操作系统、什么浏览器、什么显示器的情况下访问了网站的哪个页面、是否访问成功。
但是,对于用户行为信息的需求而言,通常不是用户电脑的IP地址、操作系统、浏览器等机器人数据,而是用户浏览了什么信息、对其喜爱程度的表现行为等可以表征用户兴趣爱好的行为信息。
当然,上述样本数据只是作为示例,在实施本申请实施例时,可以根据实际情况设置其他样本数据,本申请实施例对此不加以限制。
在采用样本数据训练模型之前,可以对原始的样本数据进行预处理,例如去除脏词和高频词、去除机器人数据、去除噪音(如访问信息很少的数据或者随机行为)等等,使之成为规范化的样本数据。
本申请实施例可以应用在一计算机集群中,如分布式系统,该计算机集群包括文件存储系统、一台或多台样本计算机、一台或多台向量计算机。
其中,文件存储系统可以存储样本全集,即所有样本数据组成的集合,该文件存储系统的一个示例为分布式文件系统,例如,NFS(Network File System)、Coda、AFS(Andrew File System)、Sprite File System,HDFS(Hadoop Distributed File System)、盘古系统等,所有的样本计算机可读。
样本计算机可以从文件存储系统中读取部分样本数据进行模型的训练,并记录它所读取到的样本数据相关的模型参数,该部分样本数据可以称之为样本子集,并对样本子集中的样本数据按照模型的训练算法进行模型训练。
向量计算机用于保存模型的向量,进行向量的计算和向量的输出。
所谓模型,通常可以表示为一个或一组向量,向量中的每一个维度,称为模型参数。
样本数据,通常表示为针对一个或一组模型参数的权重。
某个模型的示例如下:
[0.1,0.2,0.3,0,0,0.4,0.5,0.6,0.7,0.8]
在该示例中,表示一个10维的线性的模型,该模型一共有10个模型参数,如0.1、0.2等等。
模型参数的值,一般通过样本数据进行训练来得到。
例如,用户在互联网上通过浏览器页面的一次点击都可以产生一个样本数据,每一个样本数据都可能包含这个样本数据所涉及的模型参数。
某个样本数据的示例如下:
(1:0.01,3:0.02,8:0.03)
在该示例中,表示样本数据对ID为1的模型参数产生的权重为0.01,对ID为3的模型参数产生的权重为0.02,对ID为8的模型参数产生的权重为0.03,即这个样本数据会影响模型中的3个模型参数。
一般来讲,不同的样本数据,会影响不同的模型参数,利用机器学习中的训练算法,可以通过大量样本数据的训练,得到基于这个样本集的模型。
在本申请的一个实施例中,步骤101可以包括如下子步骤:
子步骤S11,从文件存储系统中读取样本全集中的部分样本数据;
子步骤S12,将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成样本子集。
在本申请实施例中,样本全集可以存储在文件存储系统中、所有样本计算机可读的区域,样本计算机可以并行采用随机等方式从样本全集中读取部分样本,并写回文件存储系统中、该样本计算机可读的区域,快速实现样本全集的切分、分配。
一般而言,因为每台样本计算机读取一部分的样本数据,因此,其相关的模型参数的总数是远小于这个模型的维度。
当然,上述样本数据的读取方式只是作为示例,在实施本申请实施例时,可以根据实际情况设置其他样本数据的读取方式,例如,将样本全集切分之后再由样本计算机读取,等等,本申请实施例对此不加以限制。
步骤102,将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
很多的机器学习的模型训练算法,在整个模型训练的过程中,承载样本数据的样本计算机之间一般是不需要交换数据的。
这些样本计算机的主要网络通信工作是经常性地上传、下载存储在非 样本计算机上的模型参数,并在自己的本地存储这些模型参数的副本。
模型参数的各个分量从数学的角度看通常是互相独立的,样本计算机上的计算通常只关注它需要的分量。
由于每个样本计算机上装载的样本数据,是整个参与训练的样本的一小部分,所以在一个空间很大的模型的训练过程中,并不是所有的模型参数都在每一个样本计算机上被使用到,换而言之,真正被关心的模型参数在不同的样本计算机上可能是不同的。
由此可见,模型参数一般是彼此独立的,对某一个样本子集来说,它只可能使用到所有模型参数中的一部分,因此,可以将它使用的模型参数单独编号。
即在每一个样本子集中,可以对部分样本数据相关的模型参数,建立针对样本全集的第一特征分量与针对样本子集的第二特征分量之间的映射关系向量。
其中,映射关系向量的长度就是这个样本子集中所涉及的模型参数个数。
为了处理方便,映射关系向量通常保存为一个哈希表或者是排序的线性表,存储在文件存储系统中。
进一步地,针对存储在文件存储系统中的部分样本数据,可以将部分样本数据相关的模型参数,按照映射向量关系从针对样本全集的第一特征分量映射为针对样本子集的第二特征分量。
例如,对于上述的10维的模型,假设当前样本子集包含如下两个样本数据:
样本数据1:(1:0.01,3:0.02,8:0.03)
样本数据2:(6:0.02)
对这个样本子集来说,与样本全集中第一特征分量为1、3、6、8的 四个模型参数相关,在映射中,可以将这四个模型参数按照顺序使用第二特征分量1、2、3、4来表示。
那么,对这个样本子集来说,它的映射关系向量为:[1:1,2:3,3:6:4:8]。
映射之后,在样本子集中包含的模型参数是4个,而不是10个:
样本数据1:(1:0.01,2:0.02,4:0.03)
样本数据2:(3:0.02)
对不同的模型参数,由于所处的映射关系向量不同,相同的模型参数的第一特征分量可能对应不同的第二特征分量。
例如,对一个数量众多的网上售卖集市,可能存在很多家商店都在销售同一件商品(有相同的第一特征分量),而用户通过多种来源(搜索引擎、推荐等)来点击这些网上商店的这件商品。
如果将这些点击记录作为样本数据,由于每个样本计算机分到的样本数据是不同的,那么在不同的样本计算机上,很有可能得到不同的映射关系向量,于是这件商品在不同的映射关系向量里面很有可能得到不同的第二特征分量。
例如,在上例中原始的商品,对于样本全集的第一特征分量为100,在映射关系向量中得到的第二特征分量是50,那么,可以将样本子集中的第一特征分量100转换为新的第二特征分量50。
由于每台样本计算机读取一部分的样本数据,其相关的模型参数的总数是远小于这个模型的维度,因此,针对样本子集的第二特征分量的数据量远小于针对样本全集的第一特征分量,映射操作可以大大压缩样本数据的数据量。
因为每一个样本子集相对数量比较少,而且,样本子集之间没有相互关系,这个映射操作可以在不同的样本计算机上并行完成。
步骤103,根据具有所述部分第二特征分量的样本数据训练模型。
在映射完成之后,原始的数据巨大的样本全集被分成了多个三元组:
(样本子集、模型参数、映射关系向量)
对于每一个三元组,样本计算机可以从文件存储系统中读取样本数据和映射关系向量,并保存到自己的内存中。
对于保存在内存中的每一个三元组,可以进行模型训练。
以梯度下降优化为例,在这个训练方法里面,模型的向量会分片的保存在向量计算机上,样本计算机会对自己负责的样本数据计算梯度和损失值(即优化目标函数),并把计算结果推到向量计算机上,并从向量计算机上获取最新的梯度值,进行下一次迭代。
本申请实施例利用单一样本计算机所承载的样本数据的局部性,读取样本全集中的部分样本数据,组合成样本子集,将部分样本数据相关的模型参数,针对样本全集的第一特征分量,映射为针对样本子集的第二特征分量,并根据具有部分第二特征分量的样本数据训练模型:
首先,映射之后可以减少模型参数在样本计算机上的副本大小,大大减少了训练的数据量,尽可能减少了计算机内存占用,使用样本计算机内存放置向量和装载样本,从而在尽可能少的效率损失的前提下,以相对低的资源开销进行机器学习、训练大规模的模型;
其次,映射对模型训练过程的计算性能没有影响,对训练算法透明,原有的训练算法无需修改直接可以使用。
在本申请的一个实施例中,步骤103可以包括如下子步骤:
子步骤S21,在每一轮迭代中,读取所述部分样本数据;
在一种情况中,当前样本计算机可以读取当前样本计算机在先存储的样本子集中的样本数据。
或者,
在另一种情况中,由于样本计算机数量有限,而样本数据的数据量巨 大,因此,每台样本计算机可能读取到多个样本子集。
在本申请实施例中,可以对样本子集进行动态转移,当接收到其他样本计算机的样本转移消息时,通过获取读权限、转移样本子集的存储区域等方式,当前样本计算机可以读取其他样本计算机在先存储的样本子集中的样本数据。
需要说明的是,由于此时的样本数据已经经过转换,存储空间得到缩减,读取速度将加快。
模型参数的本地副本一般只包含该台样本计算机真正需要的那些分量,而不需要全部分量,这样既可以使本地空间大大节省,又能够使三元组可以在真正被使用的时候去装载样本数据,从而保证整个训练过程都在样本计算机或者向量计算机内存中进行,达到高效率大规模模型训练的目的。
如果每个样本计算机仅负责自己读取到的样本子集,若某台样本计算机剩余较多的样本子集,而其他样本计算机处于空闲状态,由于数据的“长尾”而导致训练的效率下降。
本申请实施例灵活地处理样本数据,可以有效的将样本数据的负载分布到不同的样本计算机上并行,规避“长尾”带来的效率下降,容易通过增加硬件设备来提高模型规模或增加样本数据的数量。
子步骤S22,采用所述部分样本数据进行训练,获得训练结果;
对于不同的模型,其训练方法一般不同,所获得的训练结果也一般不同。
例如,对于梯度下降优化得到的凸优化模型,,其训练结果为多项式的权重。
又例如,对于随机森林,其训练结果为决策树。
这时,这个样本计算机训练得到的训练结果是针对这个三元组的,也 就是说,它的下标并不等于向量计算机上的下标值。
由于每个三元组中只包含这个三元组里面涉及到的下标,数据量大大下降,在稀疏的模型训练过程中会大大节省内存占用,提高训练速度。
子步骤S23,将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
样本计算机在完成每一个三元组的计算之后,会将训练结果推送到向量计算机。
在推送之前,可以利用三元组里面的映射关系向量,将训练结果的下标转换成向量计算机上的下标。
这个过程是样本计算机自己在通信之前进行,训练算法不感知这个过程,即下标的映射转换对训练算法透明,与训练算法无关,训练算法本身负责计算。
在具体实现中,可以从文件存储系统等位置读取预设的映射关系向量,将部分样本数据相关的模型参数,按照映射关系向量从针对样本子集的第二特征分量映射为针对样本全集的第一特征分量。
例如,对上例中的样本1,在样本子集中的样本数据为(1:0.01,2:0.02,4:0.03),假设它训练得到的权重是[1:0.05,2:0.06,3:0,4:0.07](这里第3个值是0的原因是这个样本数据并不影响ID为3的模型参数)。
已知映射关系向量为[1:1,2:3,3:6:4:8],那么,可以将样本数据进行映射,并结合梯度值,获得向量[1:0.05,3:0.06,6:0,8:0.07],所有模型参数已经恢复到针对样本全集的第一特征分量,而不是针对样本子集的第二特征分量。
子步骤S24,将所述第一特征分量对应的训练结果发送至向量计算机,以更模型中所述第一特征分量对应的模型参数。
为减少通信的数据,可以在发送的数据包中,添加字符序列,将字符 序列和训练结果发送至向量计算机。
其中,字符序列的比特位包括针对第一特征分量的更新标识和针对其他特征分量的禁止更新标识,其他特征分量为样本全集中除第一特征分量的特征分量。
如果某次通信需要更新向量计算机上的第一特征分量从1000到5000的模型参数,应用本申请实施例,可以同时传递一个(5000-1000)/8,即大约500个字节的字符序列,其中每一个比特位表示当前通信是否在更新对应的模型参数(如0表示不更新,1表示更新),同时对应的新的模型参数可以顺序传递,避免了在网络上传递大量的0值。
例如,上例中需要推送的向量为[1:0.05,3:0.06,6:0,8:0.07],通信中添加的字符序列为[1010010100],表示第一特征分量为1、3、6、8的模型参数需要更新,而真正的数据表示为[0.05,0.06,0.07]三个值。
那么整个通信所传递的字节数:10bit+3×32bit=106bit(假设浮点数使用4个字节表示),远小于直接传递浮点数所需的10×32bit=320bit。
在梯度下降优化中,若向量计算机收到新的一批梯度,则更新模型的梯度向量,在接收到全部三元组的结果之后,确定本轮的梯度值,然后更新模型向量,返回至各样本计算机,继续进行下一轮迭代训练。
在反复迭代的模型训练结束之后,向量计算机将自己保存的模型向量以“键-值”对等形式写到外部存储的文件存储系统或者数据库的表中,输出训练好的模型。
本申请实施例通过字符序列进行通信,所使用的字节数小于直接传递浮点数据的字节数,降低了对集群通信资源的消耗。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他 顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
参照图2,示出了本申请的一种模型的训练装置实施例的结构框图,具体可以包括如下模块:
样本子集读取模块201,用于读取样本全集中的部分样本数据,组合成样本子集;
特征分量映射模块202,用于将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
模型训练模块203,用于根据具有所述部分第二特征分量的样本数据训练模型。
在本申请的一个实施例中,所述样本子集读取模块201可以包括如下子模块:
第一部分样本数据读取子模块,用于从文件存储系统中读取样本全集中的部分样本数据;
部分样本数据写入子模块,用于将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成样本子集。
在本申请的一个实施例中,所述特征分量映射模块202可以包括如下子模块:
映射关系向量建立子模块,用于对所述部分样本数据相关的模型参数,建立针对所述样本全集的第一特征分量与针对所述样本子集的第二特征分量之间的映射关系向量;
样本子集映射子模块,用于将所述部分样本数据相关的模型参数,按照所述映射向量关系从针对所述样本全集的第一特征分量映射为针对所 述样本子集的第二特征分量。
在本申请的一个实施例中,所述模型训练模块203可以包括如下子模块:
第二部分样本数据读取子模块,用于在每一轮迭代中,读取所述部分样本数据;
训练子模块,用于采用所述部分样本数据进行训练,获得训练结果;
样本全集映射子模块,用于将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
通信子模块,用于将所述第一特征分量对应的训练结果发送至向量计算机,以更模型中所述第一特征分量对应的模型参数。
在本申请的一个实施例中,所述第二部分样本数据读取子模块可以包括如下单元:
第一读取单元,用于读取当前样本计算机在先存储的样本子集中的样本数据;
或者,
第二读取单元,用于在接收到其他样本计算机的样本转移消息时,读取其他样本计算机在先存储的样本子集中的样本数据。
在本申请的一个实施例中,所述样本全集映射子模块可以包括如下单元:
映射关系向量读取单元,用于读取预设的映射关系向量;
映射关系映射单元,用于将所述部分样本数据相关的模型参数,按照所述映射关系向量从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量。
在本申请的一个实施例中,所述通信子模块可以包括如下单元:
字符序列添加单元,用于添加字符序列,所述字符序列包括针对所述第一特征分量的更新标识和针对其他特征分量的禁止更新标识,所述其他特征分量为所述样本全集中除所述第一特征分量的特征分量;
发送单元,用于将所述字符序列和所述训练结果发送至向量计算机。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质 的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方 框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、 方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种模型的训练方法和一种模型的训练装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种模型的训练方法,其特征在于,包括:
    读取样本全集中的部分样本数据,组合成样本子集;
    将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
    根据具有所述部分第二特征分量的样本数据训练模型。
  2. 根据权利要求1所述的方法,其特征在于,所述读取样本全集中的部分样本数据,组合成样本子集的步骤包括:
    从文件存储系统中读取样本全集中的部分样本数据;
    将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成样本子集。
  3. 根据权利要求1或2所述的方法,其特征在于,所述将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量的步骤包括:
    对所述部分样本数据相关的模型参数,建立针对所述样本全集的第一特征分量与针对所述样本子集的第二特征分量之间的映射关系向量;
    将所述部分样本数据相关的模型参数,按照所述映射向量关系从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量。
  4. 根据权利要求1或2或3所述的方法,其特征在于,所述根据具有所述部分第二特征分量的样本数据训练模型的步骤包括:
    在每一轮迭代中,读取所述部分样本数据;
    采用所述部分样本数据进行训练,获得训练结果;
    将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
    将所述第一特征分量对应的训练结果发送至向量计算机,以更模型中 所述第一特征分量对应的模型参数。
  5. 根据权利要求4所述的方法,其特征在于,所述读取所述样本子集中的样本数据的步骤包括:
    读取当前样本计算机在先存储的样本子集中的样本数据;
    或者,
    当接收到其他样本计算机的样本转移消息时,读取其他样本计算机在先存储的样本子集中的样本数据。
  6. 根据权利要求4所述的方法,其特征在于,所述将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量的步骤包括:
    读取预设的映射关系向量;
    将所述部分样本数据相关的模型参数,按照所述映射关系向量从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量。
  7. 根据权利要求4所述的方法,其特征在于,所述将所述第一特征分量对应的训练结果发送至向量计算机的步骤包括:
    添加字符序列,所述字符序列包括针对所述第一特征分量的更新标识和针对其他特征分量的禁止更新标识,所述其他特征分量为所述样本全集中除所述第一特征分量的特征分量;
    将所述字符序列和所述训练结果发送至向量计算机。
  8. 一种模型的训练装置,其特征在于,包括:
    样本子集读取模块,用于读取样本全集中的部分样本数据,组合成样本子集;
    特征分量映射模块,用于将所述部分样本数据相关的模型参数,从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量;
    模型训练模块,用于根据具有所述部分第二特征分量的样本数据训练模型。
  9. 根据权利要求8所述的装置,其特征在于,所述样本子集读取模块包括:
    第一部分样本数据读取子模块,用于从文件存储系统中读取样本全集中的部分样本数据;
    部分样本数据写入子模块,用于将所述部分样本数据写入所述文件存储系统中指定的区域,以组合成样本子集。
  10. 根据权利要求8或9所述的装置,其特征在于,所述特征分量映射模块包括:
    映射关系向量建立子模块,用于对所述部分样本数据相关的模型参数,建立针对所述样本全集的第一特征分量与针对所述样本子集的第二特征分量之间的映射关系向量;
    样本子集映射子模块,用于将所述部分样本数据相关的模型参数,按照所述映射向量关系从针对所述样本全集的第一特征分量映射为针对所述样本子集的第二特征分量。
  11. 根据权利要求8或9或10所述的装置,其特征在于,所述模型训练模块包括:
    第二部分样本数据读取子模块,用于在每一轮迭代中,读取所述部分样本数据;
    训练子模块,用于采用所述部分样本数据进行训练,获得训练结果;
    样本全集映射子模块,用于将所述部分样本数据相关的模型参数,从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量;
    通信子模块,用于将所述第一特征分量对应的训练结果发送至向量计 算机,以更模型中所述第一特征分量对应的模型参数。
  12. 根据权利要求11所述的装置,其特征在于,所述第二部分样本数据读取子模块包括:
    第一读取单元,用于读取当前样本计算机在先存储的样本子集中的样本数据;
    或者,
    第二读取单元,用于在接收到其他样本计算机的样本转移消息时,读取其他样本计算机在先存储的样本子集中的样本数据。
  13. 根据权利要求11所述的装置,其特征在于,所述样本全集映射子模块包括:
    映射关系向量读取单元,用于读取预设的映射关系向量;
    映射关系映射单元,用于将所述部分样本数据相关的模型参数,按照所述映射关系向量从针对所述样本子集的第二特征分量映射为针对所述样本全集的第一特征分量。
  14. 根据权利要求11所述的装置,其特征在于,所述通信子模块包括:
    字符序列添加单元,用于添加字符序列,所述字符序列包括针对所述第一特征分量的更新标识和针对其他特征分量的禁止更新标识,所述其他特征分量为所述样本全集中除所述第一特征分量的特征分量;
    发送单元,用于将所述字符序列和所述训练结果发送至向量计算机。
PCT/CN2017/077696 2016-03-31 2017-03-22 一种模型的训练方法和装置 WO2017167095A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/146,642 US11580441B2 (en) 2016-03-31 2018-09-28 Model training method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610201951.4 2016-03-31
CN201610201951.4A CN107292326A (zh) 2016-03-31 2016-03-31 一种模型的训练方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/146,642 Continuation US11580441B2 (en) 2016-03-31 2018-09-28 Model training method and apparatus

Publications (1)

Publication Number Publication Date
WO2017167095A1 true WO2017167095A1 (zh) 2017-10-05

Family

ID=59963490

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077696 WO2017167095A1 (zh) 2016-03-31 2017-03-22 一种模型的训练方法和装置

Country Status (4)

Country Link
US (1) US11580441B2 (zh)
CN (1) CN107292326A (zh)
TW (1) TWI735545B (zh)
WO (1) WO2017167095A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263147A (zh) * 2019-06-05 2019-09-20 阿里巴巴集团控股有限公司 推送信息的生成方法及装置
EP3693912A4 (en) * 2017-11-07 2020-08-12 Huawei Technologies Co., Ltd. PREDICTION METHOD, TERMINAL DEVICE AND SERVER
CN114389959A (zh) * 2021-12-30 2022-04-22 深圳清华大学研究院 网络拥塞控制方法、装置、电子设备及存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202017007517U1 (de) * 2016-08-11 2022-05-03 Twitter, Inc. Aggregatmerkmale für maschinelles Lernen
US10460235B1 (en) 2018-07-06 2019-10-29 Capital One Services, Llc Data model generation using generative adversarial networks
US11474978B2 (en) 2018-07-06 2022-10-18 Capital One Services, Llc Systems and methods for a data search engine based on data profiles
CN111460804B (zh) * 2019-01-02 2023-05-02 阿里巴巴集团控股有限公司 文本处理方法、装置和系统
CN110175680B (zh) * 2019-04-03 2024-01-23 西安电子科技大学 利用分布式异步更新在线机器学习的物联网数据分析方法
CN112819020A (zh) * 2019-11-15 2021-05-18 富士通株式会社 训练分类模型的方法和装置及分类方法
CN111047050A (zh) * 2019-12-17 2020-04-21 苏州浪潮智能科技有限公司 一种分布式并行训练方法、设备以及存储介质
CN111219257B (zh) * 2020-01-07 2022-07-22 大连理工大学 基于自适应增强算法的涡扇发动机直接数据驱动控制方法
CN113538079A (zh) * 2020-04-17 2021-10-22 北京金山数字娱乐科技有限公司 一种推荐模型的训练方法及装置、一种推荐方法及装置
US11954345B2 (en) * 2021-12-03 2024-04-09 Samsung Electronics Co., Ltd. Two-level indexing for key-value persistent storage device
US20240013223A1 (en) * 2022-07-10 2024-01-11 Actimize Ltd. Computerized-method for synthetic fraud generation based on tabular data of financial transactions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663417A (zh) * 2012-03-19 2012-09-12 河南工业大学 一种小样本数据模式识别的特征选择方法
CN104732241A (zh) * 2015-04-08 2015-06-24 苏州大学 一种多分类器构建方法和系统
CN104866524A (zh) * 2015-04-10 2015-08-26 大连交通大学 一种商品图像精细分类方法
CN105426857A (zh) * 2015-11-25 2016-03-23 小米科技有限责任公司 人脸识别模型训练方法和装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0744514A (ja) 1993-07-27 1995-02-14 Matsushita Electric Ind Co Ltd ニューラルネットの学習用データ縮約化方法
US8266078B2 (en) * 2009-02-06 2012-09-11 Microsoft Corporation Platform for learning based recognition research
US9569401B2 (en) 2011-12-06 2017-02-14 Akamai Technologies, Inc. Parallel training of a support vector machine (SVM) with distributed block minimization
US10318882B2 (en) * 2014-09-11 2019-06-11 Amazon Technologies, Inc. Optimized training of linear machine learning models
AU2015336942B2 (en) * 2014-10-24 2018-02-01 Commonwealth Scientific And Industrial Research Organisation Learning with transformed data
EP3338221A4 (en) * 2015-08-19 2019-05-01 D-Wave Systems Inc. DISCRETE VARIATION SELF-ENCODING SYSTEMS AND METHODS FOR MACHINE LEARNING USING ADIABATIC QUANTUM COMPUTERS
WO2017066695A1 (en) * 2015-10-16 2017-04-20 D-Wave Systems Inc. Systems and methods for creating and using quantum boltzmann machines
US11087234B2 (en) 2016-01-29 2021-08-10 Verizon Media Inc. Method and system for distributed deep machine learning
US10558933B2 (en) * 2016-03-30 2020-02-11 International Business Machines Corporation Merging feature subsets using graphical representation
US20180005136A1 (en) * 2016-07-01 2018-01-04 Yi Gai Machine learning in adversarial environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663417A (zh) * 2012-03-19 2012-09-12 河南工业大学 一种小样本数据模式识别的特征选择方法
CN104732241A (zh) * 2015-04-08 2015-06-24 苏州大学 一种多分类器构建方法和系统
CN104866524A (zh) * 2015-04-10 2015-08-26 大连交通大学 一种商品图像精细分类方法
CN105426857A (zh) * 2015-11-25 2016-03-23 小米科技有限责任公司 人脸识别模型训练方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3693912A4 (en) * 2017-11-07 2020-08-12 Huawei Technologies Co., Ltd. PREDICTION METHOD, TERMINAL DEVICE AND SERVER
US11580457B2 (en) 2017-11-07 2023-02-14 Huawei Technologies Co., Ltd. Prediction method, terminal, and server
CN110263147A (zh) * 2019-06-05 2019-09-20 阿里巴巴集团控股有限公司 推送信息的生成方法及装置
CN110263147B (zh) * 2019-06-05 2023-10-20 创新先进技术有限公司 推送信息的生成方法及装置
CN114389959A (zh) * 2021-12-30 2022-04-22 深圳清华大学研究院 网络拥塞控制方法、装置、电子设备及存储介质
CN114389959B (zh) * 2021-12-30 2023-10-27 深圳清华大学研究院 网络拥塞控制方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US20190034833A1 (en) 2019-01-31
CN107292326A (zh) 2017-10-24
TWI735545B (zh) 2021-08-11
TW201740294A (zh) 2017-11-16
US11580441B2 (en) 2023-02-14

Similar Documents

Publication Publication Date Title
WO2017167095A1 (zh) 一种模型的训练方法和装置
CN110506260B (zh) 用于神经网络环境中的增强数据处理的方法、系统和介质
US11711420B2 (en) Automated management of resource attributes across network-based services
US20210049209A1 (en) Distributed graph embedding method and apparatus, device, and system
Allam Usage of Hadoop and Microsoft Cloud in Big Data Analytics: An Exploratory Study
CN111898698B (zh) 对象的处理方法及装置、存储介质和电子设备
Gupta et al. Faster as well as early measurements from big data predictive analytics model
Zeng et al. Optimal metadata replications and request balancing strategy on cloud data centers
CN111258978A (zh) 一种数据存储的方法
US10182104B1 (en) Automatic propagation of resource attributes in a provider network according to propagation criteria
CN112988741A (zh) 实时业务数据合并方法、装置及电子设备
Li et al. Wide-area spark streaming: Automated routing and batch sizing
Hashem et al. An Integrative Modeling of BigData Processing.
Adam et al. Bigdata: Issues, challenges, technologies and methods
CN111444148A (zh) 基于MapReduce的数据传输方法和装置
Alikhan et al. Dingo optimization based network bandwidth selection to reduce processing time during data upload and access from cloud by user
CN113360494B (zh) 一种宽表数据的生成方法、更新方法和相关装置
US10996945B1 (en) Splitting programs into distributed parts
Rao A multilingual reference based on cloud pattern
Dos Anjos et al. BIGhybrid--A Toolkit for Simulating MapReduce in Hybrid Infrastructures
CN111030856B (zh) 一种基于云的数据接入方法、电子设备及计算机可读介质
US20240020510A1 (en) System and method for execution of inference models across multiple data processing systems
US20240020550A1 (en) System and method for inference generation via optimization of inference model portions
WO2024021738A1 (zh) 数据网络图的嵌入方法、装置、计算机设备和存储介质
CN116401282A (zh) 数据处理方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773125

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773125

Country of ref document: EP

Kind code of ref document: A1