CN115186738A - Model training method, device and storage medium - Google Patents

Model training method, device and storage medium Download PDF

Info

Publication number
CN115186738A
CN115186738A CN202210700375.3A CN202210700375A CN115186738A CN 115186738 A CN115186738 A CN 115186738A CN 202210700375 A CN202210700375 A CN 202210700375A CN 115186738 A CN115186738 A CN 115186738A
Authority
CN
China
Prior art keywords
data
processing
feature
subsets
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210700375.3A
Other languages
Chinese (zh)
Other versions
CN115186738B (en
Inventor
焦学武
骆新生
李竞雪
杨俊超
宋誉文
邢文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210700375.3A priority Critical patent/CN115186738B/en
Publication of CN115186738A publication Critical patent/CN115186738A/en
Application granted granted Critical
Publication of CN115186738B publication Critical patent/CN115186738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a model training method, a model training device and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and the like. The specific implementation scheme is as follows: in the process of training the model, an original data set required by the model is divided to obtain a plurality of small-batch data subsets, the small-batch data subsets are processed in parallel based on a characteristic processing flow corresponding to the model to obtain sample data subsets corresponding to the small-batch data subsets, the sample data subsets are stored in a specified storage space, and the model training is started according to the sample data subsets currently stored in the specified storage space. Therefore, a plurality of small batches of original data are processed in parallel, the efficiency of obtaining sample data of the model can be improved, and the efficiency of model training is further improved.

Description

Model training method, device and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, to the field of artificial intelligence technologies, and in particular, to the technical fields of deep learning, and in particular, to a model training method, apparatus, and storage medium.
Background
With the development of science and technology, more and more fields adopt models for business processing, for example, in the field of natural language processing, a text to be processed is processed by a model, or in the field of image processing, an image to be processed is classified by an image classification model.
In the related art, how to train the model is very important for the rapid application of the model.
Disclosure of Invention
The present disclosure provides a method, apparatus, and storage medium for model training.
According to an aspect of the present disclosure, there is provided a model training method, which is applied in an electronic device, the method including: acquiring an original data set required by a model; dividing the original data set to obtain a plurality of data subsets; processing the plurality of data subsets in parallel according to the characteristic processing flow corresponding to the model to obtain sample data subsets of the plurality of data subsets; storing the sample data subset into a designated storage space; and starting to train the model according to the sample data subset currently stored in the designated storage space.
According to another aspect of the present disclosure, there is provided a model training apparatus, which is applied in an electronic device, the apparatus including: the acquisition module is used for acquiring an original data set required by the model; the dividing module is used for dividing the original data set to obtain a plurality of data subsets; the parallel processing module is used for processing the plurality of data subsets in parallel according to the characteristic processing flow corresponding to the model to obtain sample data subsets of the plurality of data subsets; the storage module is used for storing the sample data subset into a specified storage space; and the training module is used for starting to train the model according to the sample data subset currently stored in the appointed storage space.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the model training method of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a model training method disclosed in embodiments of the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the model training method of the present disclosure.
One embodiment in the above application has the following advantages or benefits:
in the process of training the model, an original data set required by the model is divided to obtain a plurality of small-batch data subsets, the small-batch data subsets are processed in parallel based on a characteristic processing flow corresponding to the model to obtain sample data subsets corresponding to the small-batch data subsets, the sample data subsets are stored in a specified storage space, and the model training is started according to the sample data subsets currently stored in the specified storage space. Therefore, a plurality of small batches of original data are processed in parallel, the efficiency of obtaining sample data of the model can be improved, and the efficiency of model training is further improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic illustration according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic illustration according to a sixth embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a seventh embodiment of the present disclosure;
FIG. 8 is an example diagram of a directed acyclic graph of an embodiment of the present disclosure;
FIG. 9 is an example diagram of a hierarchy diagram of an embodiment of the present disclosure;
FIG. 10 is a schematic diagram according to an eighth embodiment of the present disclosure;
FIG. 11 is a schematic diagram according to a ninth embodiment of the present disclosure;
FIG. 12 is a block diagram of an electronic device for implementing a model training method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the related art, a general process for constructing a model in a certain field is as follows: after the original data required by the model is acquired, the original data is cleaned, converted and the like to generate sample data, and the model is trained based on the sample data. The process from the original data to the sample data is generally processed and completed by a plurality of tasks in sequence, specifically, the whole original data is read, the plurality of tasks are sequentially executed, a series of operations such as cleaning, correction, feature calculation, format conversion and the like of the data are sequentially completed, and finally the sample data which can be used for model training is produced. However, it takes a long time to obtain sample data from original data, and the processing procedures of the above tasks are usually connected in series, each task processes the entire data, and the failure probabilities of each task are overlapped, resulting in poor stability of the overall processing.
Therefore, in the model training method provided by the application, in the process of training the model, an original data set required by the model is divided to obtain a plurality of small-batch data subsets, the small-batch data subsets are processed in parallel based on the characteristic processing flow corresponding to the model to obtain sample data subsets corresponding to the small-batch data subsets, the sample data subsets are stored in a specified storage space, and the model training is started according to the sample data subsets currently stored in the specified storage space. Therefore, a plurality of small batches of original data are processed in parallel, the efficiency of obtaining sample data of the model can be improved, and the efficiency of model training is further improved.
The model training method, apparatus, and storage medium of embodiments of the present disclosure are described below with reference to the accompanying drawings.
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure, which provides a model training method.
As shown in fig. 1, the model training method may include:
step 101, acquiring an original data set required by a model.
The execution subject of the model training method of this embodiment is a model training device, the model training device may be implemented by software and/or hardware, and the model training device may be an electronic device, or may be configured in an electronic device.
The electronic device may include, but is not limited to, a terminal device, a server, and the like, and the embodiment does not specifically limit the electronic device.
It should be noted that the model may be a machine learning model used in any application scenario. For example, in an image recognition scenario, the model may be an image recognition model. For another example, in a translation scenario, a model may be machine translated within the model. For another example, in a scenario where text is semantically represented, the model may be a semantic representation model.
It can be understood that, under the condition that the application scenarios corresponding to the models are different, when the models are trained, the original data required by the models are different, for example, in the translation scenario, the models are machine translation models, and in this case, the original data required by the machine translation models are parallel text corpora of two translation languages. For another example, in an image recognition scenario, the model may be an image recognition model, in which case the raw data required by the image recognition model is an original image.
Step 102, dividing the original data set to obtain a plurality of data subsets.
In some exemplary embodiments, the raw data set may include a plurality of pieces of raw data.
Wherein, each piece of original data corresponds to one generation time.
As an exemplary embodiment, the plurality of pieces of original data may be divided according to the generation time of each piece of original data to obtain a plurality of data subsets, where time intervals corresponding to the plurality of data subsets are different from each other.
For example, the model is used for predicting user behavior, in which case, the model may be referred to as a behavior prediction model, and assuming that the original data set includes a plurality of pieces of user behavior data and generation time corresponding to each piece of user behavior data, in which case, the plurality of pieces of user behavior data may be divided according to the generation time to obtain a plurality of data subsets.
It should be noted that the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the user behavior data involved in the disclosure are all performed under the premise of obtaining the consent of the user, and all meet the regulations of the relevant laws and regulations without violating the public order and good customs.
As an exemplary embodiment, the original data located in each time window may be obtained according to a plurality of preset time windows and respective corresponding generation times of a plurality of pieces of original data, and the original data in each time window may be used as one data subset.
As another exemplary embodiment, in the case that the original data set includes a plurality of original data, the original data set may be divided into a plurality of data subsets by grouping each of a preset number of original data.
The preset number is preset, for example, the preset number may be 10 ten thousand, 1 ten thousand, 2 ten thousand, or 30 ten thousand, and in practical application, the preset number may be set according to an actual service requirement, and a value of the preset number is not specifically limited in this embodiment.
For example, the application scene corresponding to the model is an image recognition application scene, at this time, the model may be an image recognition model, and assuming that an original data set required by the image recognition model includes 100 ten thousand original images, a preset number is 10 ten thousand, every 10 ten thousand original images may be divided into one group, each group of original data is a data subset, at this time, the original data set is divided, and 10 data subsets may be obtained.
In an embodiment of the present disclosure, in order to reduce the computing resources consumed in the feature processing process, in some embodiments, when it is determined that the original data read from the original data set is full data, the column field names required for feature processing may be performed according to the feature processing flow corresponding to the model, and other values except for the value of the column field name in each original data in the original data set are deleted to obtain the processed original data set.
And 103, processing the plurality of data subsets in parallel according to the characteristic processing flow corresponding to the model to obtain sample data subsets of the plurality of data subsets.
The feature processing flow may include a plurality of processing steps required for feature processing and execution dependencies of the plurality of processing steps.
As an example, the feature processing flow may include steps of data cleaning, data modification, data splicing, and feature calculation.
And 104, storing the sample data subset into a specified storage space.
In some embodiments, in order to implement feature processing and model training in the same electronic device, so that sample data can be directly obtained in the electronic device when training the model, as an example, the specified storage space may be a shared storage space in the electronic device.
And 105, starting to train the model according to the sample data subset currently stored in the designated storage space.
According to the model training method, in the process of training the model, an original data set required by the model is divided to obtain a plurality of small-batch data subsets, the small-batch data subsets are processed in parallel based on the characteristic processing flow corresponding to the model to obtain sample data subsets corresponding to the small-batch data subsets, the sample data subsets are stored in the designated storage space, and the model training is started according to the sample data subsets stored in the designated storage space currently. Therefore, a plurality of small batches of original data are processed in parallel, the efficiency of obtaining sample data of the model can be improved, and the efficiency of model training is further improved.
It will be appreciated that in some embodiments, in order to allow accurate training of the model, it is generally necessary to start training the model after the sample data set reaches a certain number, and therefore, in one embodiment of the present disclosure, before starting training the model according to the sample data subset currently stored in the designated storage space, it is determined that the number of the sample data subset currently stored in the designated storage space is greater than or equal to the preset number threshold.
As an exemplary embodiment, the number of the sample data subsets currently stored in the designated storage space may be detected, and it is determined whether the number is greater than or equal to a preset number threshold, and if the number is greater than or equal to the preset number threshold, it is determined that the number of the sample data subsets currently stored in the designated storage space is greater than or equal to the preset number threshold.
It should be noted that the preset number threshold is preset, and for example, the preset number threshold may be 6, 5, or 8. It can be understood that, in practical application, the value of the preset number threshold may be set according to an actual training requirement, and this embodiment is not particularly limited to this.
In the process of acquiring massive raw data, in the related art, the whole amount of raw data is generally read in line, and then data required for performing feature processing is extracted from the raw data, so that a large amount of useless data is read, and the bandwidth of a machine is wasted. In order to solve the technical problem, in this embodiment, each piece of original data is stored in an original data table in a column-wise storage manner, and when an original data set required by a model is acquired from the original data table, column data required by the model is directly read, so as to improve data acquisition performance. This process is described below in conjunction with fig. 2.
As shown in fig. 2, one possible implementation of the step 101 for obtaining the raw data set required by the model may include:
step 201, determining the column field name required by the feature processing flow corresponding to the model for performing the feature processing.
In an embodiment of the present disclosure, a feature processing flow corresponding to a model is obtained, column field names required when each processing step in the feature processing flow processes original data are determined, and the column field names required when the feature processing flow performs feature processing are determined according to the column field names required by each processing step.
Step 202, determining an original data table corresponding to the model, wherein a plurality of pieces of original data are stored in the original data table in a columnar storage mode.
In an embodiment of the present disclosure, the original data table may be stored in a storage cluster, or in other storage devices, and a storage location of the original data table may be set according to an actual service requirement, which is not specifically limited in this embodiment.
Step 203, reading the column data corresponding to the column field name from the original data table, and combining the column data belonging to the same original data in the read column data to obtain an original data set.
In the example embodiment, the whole amount of original data does not need to be read from the original data table, but the column data required by the characteristic processing flow of the model for characteristic processing is directly read from the original data table, so that the data reading amount is reduced, the data reading performance can be greatly improved, and the efficiency of acquiring the original data set is improved.
In an embodiment of the present disclosure, in order to further improve the efficiency of obtaining the raw data set, pipeline processing may be performed on the raw data read in batches by combining a network card, a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU) in the electronic device, so as to further improve the efficiency of obtaining the raw data set, and an exemplary description is given below on a possible implementation manner of step 203 by combining with that shown in fig. 3.
As shown in fig. 3, may include:
step 301, reading column data corresponding to column field names in batches from an original data table through a network card.
The number of the column data read in each batch may be preset, and in practical application, the number of the column data read in each batch may be set according to actual business requirements, which is not specifically limited in this embodiment.
Step 302, sending the column data of each batch to the GPU, so as to perform decoding processing on the column data of each batch by the GPU.
The decoding process may include one or more operations of decryption, decompression, or format conversion on each column data, and the embodiment is not limited in this respect.
Step 303, sending the decoding processing result of each batch of column data to the CPU, so as to combine the column data belonging to the same original data in the decoding processing result by the CPU, thereby obtaining the original data subset of each batch of column data.
It can be understood that the network card and the GPU can communicate with each other, and the GPU and the CPU can communicate with each other.
As an example, in order to enable efficient data transmission between the network card and the GPU, the network card and the GPU may communicate with each other through a Peripheral Component Interconnect Express (PCIE) bus.
As an example, to enable efficient data transfer between the CPU and the GPU, the CPU and the GPU may communicate via a PICE bus.
Step 304, merging the original data subsets of all the batch column data to obtain an original data set.
For example, the model may be a user behavior prediction model, a plurality of pieces of user raw behavior data are stored in a raw data table, it is assumed that column data of a designated column field name of raw behavior data in the raw data table is read in three batches, after a network card reads first batch of column data, the network card may transmit the first batch of column data to a GPU through a PICE bus, the GPU may decode the first batch of column data, and after the decoding is completed, the GPU transmits a decoding result of the first batch of column data to a CPU through the PICE bus. Correspondingly, the CPU processes the decoding result of the first batch of column data to obtain an original data subset corresponding to the first batch of column data. It should be noted that, in synchronization with the process of decoding the first batch of line data, the network card continues to read the second batch of line data. Correspondingly, after the second batch of data line data is read, the network card can transmit the second batch of data line data to the GPU through the PICE bus. Correspondingly, the GPU may decode the second batch of line data. Correspondingly, after the decoding processing is completed, the GPU sends the decoding result of the second batch of line data to the CPU through the PICE bus. Correspondingly, the CPU processes the decoding result of the second batch of line data to obtain the original data subset corresponding to the second batch of line data. It should be noted that, in synchronization with the process of decoding the second batch of line data, the network card continues to read the third batch of data, and after the reading is completed, the network card may transmit the third batch of line data to the GPU through the PICE bus. Correspondingly, the GPU may decode the third batch of line data, and after the decoding is completed, the GPU sends the decoding result of the third batch of line data to the CPU through the PICE bus. Correspondingly, the CPU processes the decoding result of the third batch of line data to obtain an original data subset corresponding to the third batch of line data. After the original data subsets corresponding to the three batches of column data are obtained, the original data subsets corresponding to the three batches of column data can be merged to obtain an original data set.
In the embodiment of the present invention, the network card, the GPU and the CPU in the electronic device are used to perform pipeline processing on the column data read in batches, so that the performance of data reading can be greatly improved, and the efficiency of obtaining the original data set required by the model can be further improved.
Fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure.
The model training method of this embodiment is further detailed below with reference to fig. 4.
As shown in fig. 4, the model training method may include:
step 401, obtaining an original data set required by a model, wherein the original data set includes a plurality of pieces of original data.
It should be noted that, for a specific implementation manner of step 401, reference may be made to the relevant description of the foregoing embodiments, and details are not described here again.
Step 402, dividing a plurality of pieces of original data according to the generation time of each piece of original data to obtain a plurality of data subsets, wherein time intervals corresponding to the plurality of data subsets are different from each other.
And step 403, determining the sequence of processing the plurality of data subsets according to the time sequence of the time intervals.
As an exemplary implementation manner, the plurality of data subsets may be sorted according to a chronological order of the time intervals to obtain a sorting result, where an order indicated by the sorting result is an order in which the plurality of data subsets are processed.
Step 404, in the process of sequentially processing the plurality of data subsets, when the current data subset is processed by the first data processing step in the feature processing flow, the next data subset adjacent to the current data subset is processed according to the feature processing flow until the last data subset is processed for the processed current data subset.
Step 405, saving the sample data subsets of each data subset to a designated storage space.
The sample data subset corresponding to the data subset is obtained by performing feature processing on the data subset through a feature processing flow.
And step 406, starting to train the model according to the sample data subset currently stored in the designated storage space.
It should be noted that, regarding the specific implementation manner of step 406, reference may be made to the relevant description in the foregoing embodiments, and details of this are not repeated here.
In an example embodiment, according to the generation time of each piece of original data, a plurality of pieces of original data in an original data set are divided to obtain a plurality of data subsets, according to a characteristic processing flow, pipeline processing is performed on the plurality of data subsets to obtain sample data subsets corresponding to the plurality of data subsets, the sample data subsets are stored in a specified storage space, and training of a model is started according to the sample data subsets currently stored in the specified storage space. Therefore, the original data feature processing and the model training are completed on the electronic equipment, the original data are processed in a pipeline mode during the feature processing, the efficiency of obtaining sample data can be improved, the original data are processed in the pipeline mode, the stability of data processing can be improved, the number of times of manual intervention is reduced, and the stability and the efficiency of the model training are improved.
In order that the present disclosure may be clearly understood, the method of this embodiment is exemplarily described below with reference to fig. 5. It should be noted that, in this embodiment, the feature processing procedure is described as being set in a feature processing framework of the electronic device.
As shown in fig. 5, may include:
at step 501, raw data can be read in.
Wherein the raw data can be read in from the storage cluster.
The process of reading in the original data is similar to the above process of acquiring the original data set, and details of this embodiment are not repeated.
And 502, performing data cleaning, data correction, data splicing, feature calculation and the like on the original data according to the pipeline in the feature processing framework.
For example, the original data is divided to obtain three data subsets, and the three data subsets are subjected to steps of data cleaning, data correction, data splicing, feature calculation and the like based on the pipeline, as shown in fig. 5.
Step 503, the sample data output by the feature processing framework may be held to the specified storage space.
The designated storage space may be a local disk or a local shared storage space, etc.
It can be understood that, in an actual application, the specified storage space may be preset according to an actual requirement, which is not specifically limited in this embodiment.
And step 504, acquiring sample data from the designated storage space, and training the model.
Based on any one of the above embodiments, in an example embodiment, the electronic device may include a processor CPU and a graphics processor GPU, and in this example embodiment, the feature processing flow may include a feature calculation step, where the feature calculation step includes a plurality of feature calculation sub-steps and execution dependencies between the plurality of feature calculation sub-steps, and in order to improve processing efficiency of the obtaining the feature calculation step, in an embodiment of the present disclosure, a first feature calculation sub-step of the plurality of feature calculation sub-steps is configured to run on the GPU, and second feature calculation sub-steps, other than the first feature calculation sub-step, of the plurality of second feature calculation sub-steps are configured to run on the CPU.
In some exemplary embodiments, for each feature calculation sub-step, it may be determined whether the feature calculation sub-step is a feature calculation sub-step that can be run on the GPU, and if so, the feature calculation sub-step may be taken as a first feature calculation sub-step and configured on the GPU such that the first feature calculation sub-step is run on the GPU. In other exemplary embodiments, if it is determined that the feature calculation sub-step is a feature calculation sub-step that can be run on the CPU, the feature calculation sub-step can be taken as a second feature calculation sub-step and configured on the GPU such that the second feature calculation sub-step is run on the CPU.
It is to be understood that, for each data subset, after the previous processing step adjacent to the feature calculation step completes processing the data subset, the processing result of the previous processing step may be obtained, and in the process of processing the processing result in the feature calculation step, the feature calculation sub-steps may be sequentially adopted to process the processing result according to the execution order of the plurality of feature calculation sub-steps in the feature calculation step. Correspondingly, for the current feature calculation sub-step executed, in the case that it is determined that the current feature calculation sub-step is configured to run on the GPU, a previous feature calculation sub-step adjacent to the current feature calculation sub-step is obtained, a video memory space pre-allocated for the previous feature calculation sub-step may be obtained, a processing result of the previous feature calculation sub-step is obtained from the video memory space, and the processing result of the previous feature calculation sub-step is processed on the GPU through the current feature calculation sub-step to obtain a processing result of the current feature calculation sub-step. And repeating the steps until a processing result of the last characteristic calculation substep is obtained. It should be noted that, as for the processing result of the current feature calculation sub-step, the processing result of the current feature calculation sub-step may be stored in a video memory space pre-allocated for the current feature calculation sub-step.
In order to improve the efficiency of obtaining the sample data subsets corresponding to the multiple data subsets, as shown in fig. 6, according to the feature processing flow corresponding to the model, processing the multiple data subsets in parallel to obtain one possible implementation manner of the sample data subsets corresponding to the multiple data subsets may include:
step 601, obtaining a directed acyclic graph corresponding to the feature calculation step, wherein the directed acyclic graph is established by taking each feature calculation sub-step in the feature calculation step as a node, and an execution dependency relationship between each feature calculation sub-step is established as a directed edge.
Step 602, determining the last data processing step in the feature processing flow, wherein the execution sequence is adjacent to the feature calculation step.
Step 603, for each data subset, obtaining at least one processing result obtained after the data subset is processed in the previous data processing step.
And step 604, determining a feature calculation result of the feature calculation step according to at least one processing result and the directed acyclic graph.
Step 605, determining a sample data subset of the data subset according to the feature calculation result of the feature calculation step.
In this example embodiment, the feature calculation result of the feature calculation step is quickly determined in combination with the directed acyclic graph corresponding to the feature calculation step and the processing result of the previous processing step adjacent to the feature calculation step, and then the sample data subset corresponding to the data subset may be quickly determined.
In some embodiments of the present disclosure, in order to improve efficiency of obtaining sample data subsets corresponding to a plurality of data subsets, after processing of the same feature sub-step that the plurality of feature sub-steps depend on is completed, the plurality of feature sub-steps that depend on the feature sub-step may be controlled to process the processing result of the feature sub-step in parallel.
In an embodiment of the present disclosure, in order to obtain a feature calculation result of a feature calculation step quickly, as shown in fig. 7, determining a possible implementation manner of the feature calculation result of the feature calculation step according to at least one processing result and a directed acyclic graph may include:
step 701, determining a hierarchical graph corresponding to the directed acyclic graph according to the execution dependency relationship among the nodes in the directed acyclic graph.
The hierarchical graph comprises a plurality of hierarchies, each hierarchy comprises at least one node, an execution dependency relationship is formed between a first target node in the ith hierarchy and a second target node in the (i-1) th hierarchy, the first target node is one of the at least one node in the i th hierarchy, the second target node is one of the at least one node in the (i-1) th hierarchy, i is an integer which is larger than 1 and smaller than N, and the value of N is the total number of the hierarchies in the hierarchical graph.
Step 702, for each processing result in the at least one processing result, inputting the processing result into a third target node in a j-th hierarchy, where the third target node is a node in the j-th hierarchy that takes the processing result as input, and an initial value of j is 1.
And 703, inputting the output result of the third target node in the j-th level to a fourth target node in a j + 1-th level, wherein the fourth target node is a node in the j + 1-th level, which has an execution dependency relationship with the third target node.
And step 704, performing 1 addition on j, and skipping to the step of inputting the output result of the first target node in the j level to the second target node in the j +1 level under the condition that j +1 is smaller than N until j +1 is equal to N.
Step 705, determining the feature calculation result of the feature calculation step according to the output result of the leaf node in the hierarchical graph.
In the exemplary embodiment, the hierarchical graph of the directed ring graph is determined, and the nodes are executed level by level, and a plurality of nodes in the same level can be processed in parallel, so that the output result of the leaf node in the hierarchical graph can be determined quickly, and the efficiency of obtaining the feature calculation result of the feature calculation step can be improved.
For example, a directed acyclic graph corresponding to the feature calculation step is shown in fig. 8, where the directed acyclic graph in fig. 8 not only illustrates the calculation nodes, but also illustrates relationships between the calculation nodes, and the input nodes and the output nodes, where each of the calculation nodes corresponds to each of the feature calculation substeps in the feature calculation step. Wherein, the input nodes in fig. 8 are node 1 and node 2, respectively; the calculation nodes are respectively a node 3, a node 4, a node 5, a node 6, a node 7, a node 8, a node 9, a node 10 and a node 11; the output nodes are node 12, node 13, node 14 and node 15, respectively. Two nodes associated with any directed edge in fig. 8 include a start point and an end point, and the direction of the directed edge represents a dependency execution relationship. It should be noted that the end point of the directed edge is executed depending on the starting point. Correspondingly, a hierarchical graph corresponding to the directed acyclic graph can be determined according to the dependency execution relationship between each node in the directed acyclic graph and whether each node can run on the GPU. Wherein, an example diagram of the hierarchy diagram is shown in fig. 9, wherein nodes 1, 2, 3 and 7, 10 and 14 in the upper part of the dotted line a in fig. 9 are all configured to execute on the CPU, and nodes 4, 5, 6, 8, 9, 15, 11, 12 and 13 in the lower part of the dotted line a are all configured to execute on the GPU. When executing the nodes in the hierarchical diagram, traversing the nodes layer by layer from the first level until the last level is reached. When nodes in the same level are traversed, the nodes in the level can be traversed according to a traversal sequence preset for the nodes in the level. After traversing the nodes in the hierarchical graph, the feature processing result of the feature calculation step can be determined according to the output result of the leaf nodes in the hierarchical graph. For the leaf nodes in fig. 9 are node 12, node 13, node 14 and node 15. Correspondingly, the feature processing result of the feature calculation step may be determined from the output results of each of the node 12, the node 13, the node 14, and the node 15.
In order to realize the above embodiment, the present disclosure further provides a model training device.
FIG. 10 is a schematic diagram of an eighth embodiment of the present disclosure that provides a model training apparatus. It should be noted that the model training apparatus is applied to electronic devices.
As shown in fig. 10, the model training apparatus 10 may include an obtaining module 101, a dividing module 102, a parallel processing module 103, a saving module 104, and a training module 105, wherein:
an obtaining module 101, configured to obtain a raw data set required by a model.
The dividing module 102 is configured to divide an original data set to obtain a plurality of data subsets.
The parallel processing module 103 is configured to process the multiple data subsets in parallel according to the feature processing flow corresponding to the model, so as to obtain sample data subsets of the multiple data subsets.
And the saving module 104 is configured to save the sample data subset to the specified storage space.
And the training module 105 is configured to start training the model according to the sample data subset currently stored in the designated storage space.
In the model training device according to the embodiment of the disclosure, in the process of training a model, an original data set required by the model is divided to obtain a plurality of small-batch data subsets, the plurality of small-batch data subsets are processed in parallel based on a characteristic processing flow corresponding to the model to obtain sample data subsets corresponding to the plurality of small-batch data subsets, the sample data subsets are stored in a specified storage space, and the model training is started according to the sample data subsets currently stored in the specified storage space. Therefore, a plurality of small batches of original data are processed in parallel, the efficiency of obtaining sample data of the model can be improved, and the efficiency of model training is further improved.
In one embodiment of the present disclosure, as shown in fig. 11, the model training apparatus 11 may include: an acquisition module 111, a division module 112, a parallel processing module 113, a saving module 114, a training module 115, and a determination module 116, wherein the acquisition module 111 may include a first determination unit 1111, a second determination unit 1112, and a processing unit 1113; the parallel processing module 113 may include a first acquiring unit 1131, a third determining unit 1132, a second acquiring unit 1133, a fourth determining unit 1134, and a fifth determining unit 1135.
It should be noted that, for the detailed description of the dividing module 112, the saving module 114 and the training module 115, reference may be made to the description of the dividing module 101, the saving module 104 and the training module 105 in fig. 10, and a description thereof is omitted here.
In one embodiment of the present disclosure, the obtaining module 111 includes:
a first determining unit 1111, configured to determine a column field name required for performing feature processing on a feature processing flow corresponding to the model;
a second determining unit 1112, configured to determine an original data table corresponding to the model, where a plurality of pieces of original data are stored in a columnar storage manner in the original data table;
the processing unit 1113 is configured to read column data corresponding to the column field name from the original data table, and combine column data belonging to the same piece of original data in the read column data to obtain an original data set.
In an embodiment of the present disclosure, the electronic device includes a network card, a graphics processing unit GPU, a central processing unit CPU, and the processing unit 1113, and is specifically configured to:
reading column data corresponding to column field names in batches from an original data table through a network card;
sending the row data of each batch to a GPU (graphics processing unit) so as to decode the row data of each batch through the GPU;
sending the decoding processing result of each batch of column data to a CPU (central processing unit) so as to combine the column data belonging to the same original data in the decoding processing result through the CPU to obtain an original data subset of each batch of column data;
and merging the original data subsets of all the batch column data to obtain an original data set.
In an embodiment of the present disclosure, the determining module 116 is configured to determine that the number of the sample data subsets currently stored in the designated storage space is greater than or equal to a preset number threshold.
In an embodiment of the disclosure, the raw data set includes a plurality of pieces of raw data, and the dividing module 112 is specifically configured to: the method comprises the steps of dividing a plurality of pieces of original data according to the generation time of each piece of original data to obtain a plurality of data subsets, wherein time intervals corresponding to the plurality of data subsets are different.
In an embodiment of the present disclosure, the parallel processing module 113 is specifically configured to: determining the sequence of processing the plurality of data subsets according to the time sequence of the time intervals; in the process of sequentially processing a plurality of data subsets, aiming at the processed current data subset, when the current data subset is processed by the first data processing step in the characteristic processing flow, the next data subset adjacent to the current data subset is processed according to the characteristic processing flow until the last data subset is processed.
In one embodiment of the present disclosure, an electronic device includes a processor CPU and a graphics processor GPU, a feature processing flow includes a feature calculation step including a plurality of feature calculation sub-steps and execution dependencies between the plurality of feature calculation sub-steps, a first feature calculation sub-step of the plurality of feature calculation sub-steps is configured to run on the GPU, and a second feature calculation sub-step of the plurality of second feature calculation sub-steps other than the first feature calculation sub-step is configured to run on the CPU.
In one embodiment of the present disclosure, the parallel processing module 113 may include:
a first obtaining unit 1131, configured to obtain a directed acyclic graph corresponding to the feature calculation step, where the directed acyclic graph is created by using each feature calculation sub-step in the feature calculation step as a node, and an execution dependency relationship between each feature calculation sub-step is established for a directed edge;
a third determining unit 1132, configured to determine a previous data processing step in the feature processing flow, where an execution order of the previous data processing step is adjacent to the feature calculating step;
a second obtaining unit 1133, configured to obtain, for each data subset, at least one processing result obtained after the data subset is processed in the previous data processing step;
a fourth determining unit 1134, configured to determine a feature calculation result of the feature calculation step according to the at least one processing result and the directed acyclic graph;
a fifth determining unit 1135, configured to determine a sample data subset of the data subset according to the feature calculation result of the feature calculating step.
In an embodiment of the disclosure, the fourth determining unit 1134 is specifically configured to: determining a hierarchical graph corresponding to the directed acyclic graph according to an execution dependency relationship among nodes in the directed acyclic graph, wherein the hierarchical graph comprises a plurality of hierarchies, each hierarchy comprises at least one node, an execution dependency relationship exists between a first target node in an ith hierarchy and a second target node in an (i-1) th hierarchy, the first target node is one of the at least one node in the i hierarchy, the second target node is one of the at least one node in the (i-1) th hierarchy, i is an integer which is greater than 1 and less than N, and the value of N is the total number of the hierarchies in the hierarchical graph; for each processing result in the at least one processing result, inputting the processing result into a third target node in the j-th hierarchy, wherein the third target node is a node in the j-th hierarchy which takes the processing result as input, and the initial value of j is 1; inputting an output result of a third target node in the j level to a fourth target node in the j +1 level, wherein the fourth target node is a node which has an execution dependency relationship with the third target node in the j +1 level; adding 1 to j, and skipping to the step of inputting the output result of the first target node in the j level to the second target node in the j +1 level under the condition that j +1 is smaller than N until j +1 is equal to N; and determining the feature calculation result of the feature calculation step according to the output result of the leaf node in the hierarchical graph.
It should be noted that the above explanation of the model training method is also applicable to the model training apparatus in this embodiment, and this embodiment is not described again.
The present disclosure also provides an electronic device and a readable storage medium and a computer program product according to embodiments of the present disclosure.
FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the electronic device 1200 may include a computing unit 1201, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the device 1200 can also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various application specific Artificial Intelligence (AI) computing chips, computing units running model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, etc. The computing unit 1201 performs the various methods and processes described above, such as a model training method. For example, in some embodiments, the model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into RAM 1203 and executed by computing unit 1201, one or more steps of the model training method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the model training method by any other suitable means (e.g., by means of firmware).
The various implementations of the devices and techniques described here above may be implemented in digital electronic circuit devices, integrated circuit devices, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), devices on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable device including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage device, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the apparatus and techniques described herein may be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The apparatus and techniques described here can be implemented in a computing device that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the apparatus and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the device can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer device may include a client and a server. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may be a cloud server, a server of a distributed device, or a server combining a blockchain.
It should be noted that artificial intelligence is a subject for studying a computer to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware and software technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (21)

1. A method of model training, the method comprising:
acquiring an original data set required by a model;
dividing the original data set to obtain a plurality of data subsets;
processing the plurality of data subsets in parallel according to the characteristic processing flow corresponding to the model to obtain sample data subsets of the plurality of data subsets;
storing the sample data subset into a designated storage space;
and starting to train the model according to the sample data subset currently stored in the designated storage space.
2. The method of claim 1, wherein the obtaining of the raw set of data required by the model comprises:
determining the column field name required by the characteristic processing flow corresponding to the model when the characteristic processing is carried out;
determining a raw data table corresponding to the model, wherein a plurality of pieces of raw data are stored in the raw data table in a columnar storage mode;
and reading the column data corresponding to the column field names from the original data table, and combining the column data belonging to the same original data in the read column data to obtain an original data set.
3. The method according to claim 2, wherein when the method is applied to an electronic device, the electronic device includes a network card, a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU), and the reading column data corresponding to the column field names from the raw data table and combining column data belonging to the same raw data in the read column data to obtain a raw data set includes:
reading column data corresponding to the column field names in batches from the original data table through the network card;
sending the row data of each batch to the GPU so as to decode the row data of each batch through the GPU;
sending the decoding processing result of each batch of column data to the CPU, so that the CPU combines the column data belonging to the same piece of original data in the decoding processing result to obtain an original data subset of each batch of column data;
and merging the original data subsets of all the batch column data to obtain the original data set.
4. The method of claim 1, wherein prior to said starting training of said model according to a subset of sample data currently stored in said specified storage space, said method further comprises:
and determining that the number of the sample data subsets currently stored in the specified storage space is greater than or equal to a preset number threshold.
5. The method of claim 1, wherein the raw data set comprises a plurality of pieces of raw data, and the dividing the raw data set to obtain a plurality of data subsets comprises:
dividing the original data according to the generation time of the original data to obtain a plurality of data subsets, wherein the time intervals corresponding to the data subsets are different.
6. The method according to claim 5, wherein the processing the plurality of data subsets in parallel according to the feature processing flow corresponding to the model to obtain sample data subsets of each of the plurality of data subsets comprises:
determining the sequence of processing the plurality of data subsets according to the time sequence of the time intervals;
in the process of processing the plurality of data subsets according to the sequence, when the current data subset is processed by the first data processing step in the feature processing flow, the next data subset adjacent to the current data subset is processed according to the feature processing flow until the last data subset is processed for the processed current data subset.
7. The method of claim 1, wherein the electronic device comprises a processor CPU and a graphics processor GPU, the feature processing flow comprises a feature computation step comprising a plurality of feature computation sub-steps and execution dependencies between the plurality of feature computation sub-steps, a first feature computation sub-step of the plurality of feature computation sub-steps is configured to run on the GPU, and a second feature computation sub-step of the plurality of second feature computation sub-steps, other than the first feature computation sub-step, is configured to run on the CPU.
8. The method according to claim 7, wherein the processing the plurality of subsets of data in parallel according to the feature processing procedure corresponding to the model to obtain sample data subsets of each of the plurality of subsets of data comprises:
acquiring a directed acyclic graph corresponding to the feature calculation step, wherein the directed acyclic graph is established by taking each feature calculation sub-step in the feature calculation step as a node and taking an execution dependency relationship among each feature calculation sub-step as a directed edge;
determining a previous data processing step in the feature processing flow, wherein the execution sequence of the previous data processing step is adjacent to the feature calculation step;
for each data subset, acquiring at least one processing result obtained after the data subset is processed in the previous data processing step;
determining a feature calculation result of the feature calculation step according to the at least one processing result and the directed acyclic graph;
and determining a sample data subset of the data subset according to the feature calculation result of the feature calculation step.
9. The method of claim 8, wherein said determining a feature computation result of said feature computation step based on said at least one processing result and said directed acyclic graph comprises:
determining a hierarchical graph corresponding to the directed acyclic graph according to an execution dependency relationship among nodes in the directed acyclic graph, wherein the hierarchical graph comprises a plurality of levels, each level comprises at least one node, an execution dependency relationship exists between a first target node in an ith level and a second target node in an i-1 th level, the first target node is one of the at least one node in the i level, the second target node is one of the at least one node in the i-1 th level, i is an integer greater than 1 and less than N, and the value of N is the total number of levels in the hierarchical graph;
for each processing result in the at least one processing result, inputting the processing result into a third target node in a j-th hierarchy, wherein the third target node is a node in the j-th hierarchy with the processing result as input, and an initial value of j is 1;
inputting an output result of a third target node in a j level to a fourth target node in a j +1 level, wherein the fourth target node is a node in the j +1 level having an execution dependency relationship with the third target node;
performing 1 addition processing on the j, and skipping to the step of inputting the output result of the first target node in the j level to the second target node in the j +1 level under the condition that the j +1 is smaller than N until the j +1 is equal to N;
and determining the feature calculation result of the feature calculation step according to the output result of the leaf node in the hierarchical graph.
10. A model training apparatus, the apparatus comprising:
the acquisition module is used for acquiring an original data set required by the model;
the dividing module is used for dividing the original data set to obtain a plurality of data subsets;
the parallel processing module is used for processing the plurality of data subsets in parallel according to the characteristic processing flow corresponding to the model to obtain sample data subsets of the plurality of data subsets;
the storage module is used for storing the sample data subset into a specified storage space;
and the training module is used for starting to train the model according to the sample data subset currently stored in the designated storage space.
11. The apparatus of claim 10, wherein the means for obtaining comprises:
the first determining unit is used for determining the column field names required by the characteristic processing of the characteristic processing flow corresponding to the model;
a second determining unit, configured to determine an original data table corresponding to the model, where a plurality of pieces of original data are stored in the original data table in a columnar storage manner;
and the processing unit is used for reading the column data corresponding to the column field names from the original data table, and combining the column data belonging to the same original data in the read column data to obtain an original data set.
12. The apparatus according to claim 11, wherein, in a case where the apparatus is configured in an electronic device, the electronic device includes a network card, a graphics processor GPU, and a central processing unit CPU, and the processing unit is specifically configured to:
reading column data corresponding to the column field names in batches from the original data table through the network card;
sending the row data of each batch to the GPU so as to decode the row data of each batch through the GPU;
sending the decoding processing result of each batch of column data to the CPU, and combining the column data belonging to the same original data in the decoding processing result through the CPU to obtain an original data subset of each batch of column data;
and merging the original data subsets of all the batch column data to obtain the original data set.
13. The apparatus of claim 10, wherein the apparatus further comprises:
and the determining module is used for determining that the number of the sample data subsets currently stored in the designated storage space is greater than or equal to a preset number threshold.
14. The apparatus according to claim 10, wherein the raw data set includes a plurality of pieces of raw data, and the dividing module is specifically configured to:
dividing the original data according to the generation time of the original data to obtain a plurality of data subsets, wherein the time intervals corresponding to the data subsets are different.
15. The apparatus of claim 14, wherein the parallel processing module is specifically configured to:
determining the sequence of processing the plurality of data subsets according to the time sequence of the time intervals;
in the process of processing a plurality of data subsets according to the sequence, when the current data subset is processed by the first data processing step in the feature processing flow, the next data subset adjacent to the current data subset is processed according to the feature processing flow until the last data subset is processed for the processed current data subset.
16. The apparatus of claim 10, wherein the electronic device comprises a processor CPU and a graphics processor GPU, the feature processing flow comprises a feature computation step including a plurality of feature computation sub-steps and execution dependencies between the plurality of feature computation sub-steps, a first feature computation sub-step of the plurality of feature computation sub-steps is configured to run on the GPU, and a second feature computation sub-step of the plurality of second feature computation sub-steps, other than the first feature computation sub-step, is configured to run on the CPU.
17. The apparatus of claim 16, wherein the parallel processing module comprises:
a first obtaining unit, configured to obtain a directed acyclic graph corresponding to the feature calculation step, where the directed acyclic graph is created using each feature calculation sub-step in the feature calculation step as a node, and an execution dependency relationship between each feature calculation sub-step is established for a directed edge;
a third determining unit, configured to determine a previous data processing step in the feature processing flow, where an execution order of the previous data processing step is adjacent to the feature calculating step;
a second obtaining unit, configured to obtain, for each data subset, at least one processing result obtained after the data subset is processed in the previous data processing step;
a fourth determining unit, configured to determine a feature calculation result of the feature calculation step according to the at least one processing result and the directed acyclic graph;
and a fifth determining unit, configured to determine a sample data subset of the data subset according to a feature calculation result of the feature calculating step.
18. The apparatus according to claim 17, wherein the fourth determining unit is specifically configured to:
determining a hierarchical graph corresponding to the directed acyclic graph according to an execution dependency relationship among nodes in the directed acyclic graph, wherein the hierarchical graph comprises a plurality of levels, each level comprises at least one node, an execution dependency relationship exists between a first target node in an ith level and a second target node in an i-1 th level, the first target node is one of the at least one node in the i level, the second target node is one of the at least one node in the i-1 th level, i is an integer greater than 1 and less than N, and the value of N is the total number of levels in the hierarchical graph;
for each processing result in the at least one processing result, inputting the processing result to a third target node in a j-th level, wherein the third target node is a node in the j-th level which takes the processing result as input, and an initial value of j is 1;
inputting an output result of a third target node in a j level to a fourth target node in a j +1 level, wherein the fourth target node is a node in the j +1 level having an execution dependency relationship with the third target node;
performing 1 addition processing on the j, and skipping to the step of inputting the output result of the first target node in the j level to the second target node in the j +1 level under the condition that the j +1 is smaller than N until the j +1 is equal to N;
and determining the feature calculation result of the feature calculation step according to the output result of the leaf node in the hierarchical graph.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1-9.
CN202210700375.3A 2022-06-20 2022-06-20 Model training method, device and storage medium Active CN115186738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210700375.3A CN115186738B (en) 2022-06-20 2022-06-20 Model training method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210700375.3A CN115186738B (en) 2022-06-20 2022-06-20 Model training method, device and storage medium

Publications (2)

Publication Number Publication Date
CN115186738A true CN115186738A (en) 2022-10-14
CN115186738B CN115186738B (en) 2023-04-07

Family

ID=83514830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210700375.3A Active CN115186738B (en) 2022-06-20 2022-06-20 Model training method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115186738B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090006A (en) * 2023-02-01 2023-05-09 北京三维天地科技股份有限公司 Sensitive identification method and system based on deep learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107179839A (en) * 2017-05-23 2017-09-19 三星电子(中国)研发中心 Information output method, device and equipment for terminal
US20200184368A1 (en) * 2018-12-10 2020-06-11 International Business Machines Corporation Machine learning in heterogeneous processing systems
CN112214775A (en) * 2020-10-09 2021-01-12 平安国际智慧城市科技股份有限公司 Injection type attack method and device for graph data, medium and electronic equipment
CN112561078A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Distributed model training method, related device and computer program product
CN112884086A (en) * 2021-04-06 2021-06-01 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and program product
CN113361574A (en) * 2021-05-27 2021-09-07 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113808044A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Encryption mask determining method, device, equipment and storage medium
CN114004383A (en) * 2020-07-14 2022-02-01 华为技术有限公司 Training method of time series prediction model, time series prediction method and device
CN114357242A (en) * 2021-12-20 2022-04-15 腾讯科技(深圳)有限公司 Training evaluation method and device based on recall model, equipment and storage medium
CN114357105A (en) * 2022-03-10 2022-04-15 北京百度网讯科技有限公司 Pre-training method and model fine-tuning method of geographic pre-training model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107179839A (en) * 2017-05-23 2017-09-19 三星电子(中国)研发中心 Information output method, device and equipment for terminal
US20200184368A1 (en) * 2018-12-10 2020-06-11 International Business Machines Corporation Machine learning in heterogeneous processing systems
CN114004383A (en) * 2020-07-14 2022-02-01 华为技术有限公司 Training method of time series prediction model, time series prediction method and device
CN112214775A (en) * 2020-10-09 2021-01-12 平安国际智慧城市科技股份有限公司 Injection type attack method and device for graph data, medium and electronic equipment
CN112561078A (en) * 2020-12-18 2021-03-26 北京百度网讯科技有限公司 Distributed model training method, related device and computer program product
CN112884086A (en) * 2021-04-06 2021-06-01 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and program product
CN113361574A (en) * 2021-05-27 2021-09-07 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113808044A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Encryption mask determining method, device, equipment and storage medium
CN114357242A (en) * 2021-12-20 2022-04-15 腾讯科技(深圳)有限公司 Training evaluation method and device based on recall model, equipment and storage medium
CN114357105A (en) * 2022-03-10 2022-04-15 北京百度网讯科技有限公司 Pre-training method and model fine-tuning method of geographic pre-training model

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
YONGNAN ZHANG等: "Traffic Network Flow Prediction Using Parallel Training for Deep Convolutional Neural Networks on Spark Cloud", 《IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS》 *
侯岩: "面向数据流的分布式SVM算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
刘滔等: "基于MapReduce的中文词性标注CRF模型并行化训练研究", 《北京大学学报(自然科学版)》 *
张洪胜: "基于混合样本训练的并行层叠支持向量机研究", 《金陵科技学院学报》 *
杨俊超: "基于大数据分析与挖掘的铁路沉降灾害预警模型研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 *
黄浴: "深度学习的分布和并行处理系统", 《HTTPS://ZHUANLAN.ZHIHU.COM/P/58806183》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090006A (en) * 2023-02-01 2023-05-09 北京三维天地科技股份有限公司 Sensitive identification method and system based on deep learning
CN116090006B (en) * 2023-02-01 2023-09-08 北京三维天地科技股份有限公司 Sensitive identification method and system based on deep learning

Also Published As

Publication number Publication date
CN115186738B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US11573992B2 (en) Method, electronic device, and storage medium for generating relationship of events
CN112560496A (en) Training method and device of semantic analysis model, electronic equipment and storage medium
CN114399769B (en) Training method of text recognition model, and text recognition method and device
CN113590645B (en) Searching method, searching device, electronic equipment and storage medium
CN112528641A (en) Method and device for establishing information extraction model, electronic equipment and readable storage medium
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
CN115186738B (en) Model training method, device and storage medium
CN113641829A (en) Method and device for training neural network of graph and complementing knowledge graph
CN114330718B (en) Method and device for extracting causal relationship and electronic equipment
CN113344214B (en) Training method and device of data processing model, electronic equipment and storage medium
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN113361574A (en) Training method and device of data processing model, electronic equipment and storage medium
CN115759233B (en) Model training method, graph data processing device and electronic equipment
CN113836291B (en) Data processing method, device, equipment and storage medium
CN114255427B (en) Video understanding method, device, equipment and storage medium
CN116383454B (en) Data query method of graph database, electronic equipment and storage medium
CN113962382A (en) Training sample construction method and device, electronic equipment and readable storage medium
CN115223177A (en) Text recognition method, device, equipment and storage medium
CN115525295A (en) Automatic code editing method and device, electronic equipment and storage medium
CN116167978A (en) Model updating method and device, electronic equipment and storage medium
CN113360624A (en) Training method, response device, electronic device and storage medium
CN117764560A (en) Vehicle maintenance suggestion acquisition method and vehicle maintenance suggestion generation method
CN115934101A (en) Interface document generation method, device, medium and electronic equipment
CN115827893A (en) Skill culture knowledge map generation method, device, equipment and medium
CN117576422A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant