JP7408042B2

JP7408042B2 - Distributed heterogeneous data processing method, device, and equipment based on deep learning

Info

Publication number: JP7408042B2
Application number: JP2023043894A
Authority: JP
Inventors: 劉東升; 劉礼芳; 陳亜輝; 劉彦▲に▼
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2022-05-18
Filing date: 2023-03-20
Publication date: 2024-01-05
Anticipated expiration: 2043-03-20
Also published as: CN115063647A; JP2023171248A

Description

本開示の実施形態は、コンピュータ技術分野に関し、具体的には、ディープラーニングに
基づく分散式異種データの処理方法、装置及び設備に関する。 TECHNICAL FIELD Embodiments of the present disclosure relate to the field of computer technology, and specifically to a method, apparatus, and equipment for processing distributed heterogeneous data based on deep learning.

モデルトレーニングとは、トレーニングで得られたモデルが予測または分類能力を持つよ
うに、初期モデルをトレーニングすることを意味する。現在、モデル分類を再入力する際
に一般的に採用されている方法は、固定された分散トレーニング方式によってモデルをト
レーニングすることである。
しかしながら、上記の方式を採用する場合には、次のような技術的課題がしばしば存在す
る。
第一に、分散式に含まれる各ノードのノード構成が異なることが多く、複数回トレーニン
グされるモデルのモデル構造が異なることが多いため、固定的な分散トレーニング方式を
用いてモデルをトレーニングすると、分散式におけるノードのノード使用効率が低下する
ことが多い。
第二に、モデルが複数のサブモデルを含み、モデル構造が複雑な場合、全体モデルを直接
トレーニングし、モデルのトレーニング周期が長く、モデルのトレーニング効率が低下す
る。 Model training means training an initial model so that the model obtained through training has predictive or classification ability. Currently, the commonly adopted method for repopulating model classification is to train the model using a fixed distributed training scheme.
However, when adopting the above method, the following technical problems often exist.
First, the node configuration of each node included in the distributed formula is often different, and the model structures of models trained multiple times are often different. Therefore, when training a model using a fixed distributed training method, Node usage efficiency of nodes in a distributed system often decreases.
Second, if the model includes multiple sub-models and the model structure is complex, the whole model will be trained directly, the model training cycle will be long, and the model training efficiency will be reduced.

本開示の内容部分は、後述する特定の実施形態部分に詳細に説明される簡潔な形式でアイ
デアを紹介するために使用される。
本開示のいくつかの実施形態は、上記の背景技術セクションで言及された技術的問題のう
ちの１つまたは複数を解決するために、ディープラーニングに基づく分散式異種データの
処理方法、装置、および設備を提案する。 The content section of this disclosure is used to introduce ideas in a concise form that are explained in detail in the specific embodiments section below.
Some embodiments of the present disclosure provide a deep learning-based distributed heterogeneous data processing method, apparatus, and method to solve one or more of the technical problems mentioned in the background section above. Suggest equipment.

第１の態様では、本開示のいくつかの実施形態は、初期モデルに含まれる各サブモデルの
うちの１つのサブモデルが構成されたノード情報グループシーケンスを決定するステップ
と、ノード情報グループシーケンス中のノード情報グループに対応するノードグループに
構成されたノード情報グループシーケンスを決定するステップと、ディープラーニングに
基づく分散式異種データの処理方法を提供する。前記ノード情報グループのシーケンス中
の各ノード情報グループに対応するノード構成情報とモデル構造情報に基づいて、前記ノ
ード情報グループに対応するトレーニングサンプルの集合を決定し、前記モデル構造情報
はノード情報グループに対応するサブモデルのモデル構造を特徴づける、前記ノード情報
グループシーケンス中の各ノード情報グループに対して、前記ノード情報グループに対応
するモデル構造情報に基づいて、前記ノード情報グループに対応するトレーニングサンプ
ル集合中のトレーニングサンプルをサンプル再構成して、ターゲットトレーニングサンプ
ル集合を生成する、前記各サブモデルの各サブモデルに対して、前記サブモデルに対応す
るターゲットトレーニングサンプル集合を通じて、前記サブモデルに対してモデルトレー
ニングを行い、サブトレーニング結果を生成する、決定された複数のサブトレーニング結
果がすべて収束することに応答して、上記各サブモデルにおけるサブモデルに対応する現
在のモデルパラメータ情報に基づいて、トレーニングが完了した初期モデルを生成する。 In a first aspect, some embodiments of the present disclosure provide the steps of: determining a node information group sequence in which one submodel of each submodel included in the initial model is configured; A method for processing distributed heterogeneous data based on deep learning is provided. A set of training samples corresponding to the node information group is determined based on the node configuration information and model structure information corresponding to each node information group in the sequence of node information groups, and the model structure information is added to the node information group. for each node information group in the node information group sequence, characterizing the model structure of the corresponding sub-model, a training sample set corresponding to the node information group, based on model structure information corresponding to the node information group; sample-reconstructing the training samples in the model to generate a target training sample set; Perform training and generate sub-training results. In response to all of the determined sub-training results converging, training is performed based on the current model parameter information corresponding to the sub-model in each of the above-mentioned sub-models. Generate a completed initial model.

第２の態様では、本開示のいくつかの実施形態は、ノード情報グループのシーケンスを決
定するように構成された情報決定ユニットと、前記ノード情報グループのシーケンスにお
ける各ノード情報グループに対応するノード構成情報及びモデル構造情報に基づいて、前
記ノード情報グループに対応するサブモデルのモデル構造を特徴づけるモデル構造情報グ
ループ対応トレーニングサンプル集合を決定するサンプル決定手段と、前記ノード情報グ
ループシーケンス中の各ノード情報グループに対して、前記ノード情報グループに対応す
るモデル構造情報に基づいて、前記ノード情報グループに対応するトレーニングサンプル
集合中のトレーニングサンプルをサンプル再構成して、ターゲットトレーニングサンプル
集合を生成するサンプル再構成手段と、前記各サブモデルの各サブモデルに対して、前記
サブモデルに対応するターゲットトレーニングサンプル集合を介して、前記サブモデルを
モデルトレーニングして、サブトレーニング結果を生成するモデルトレーニングユニット
と、決定された複数のサブトレーニング結果がすべて収束することに応答して、前記各サ
ブモデルにおけるサブモデルに対応する現在のモデルパラメータ情報に基づいて、トレー
ニングが完了した初期モデルを生成するように構成されたモデル生成手段と、を含む。 In a second aspect, some embodiments of the present disclosure provide an information determining unit configured to determine a sequence of node information groups and a node configuration corresponding to each node information group in the sequence of node information groups. sample determining means for determining a model structure information group corresponding training sample set characterizing a model structure of a sub-model corresponding to the node information group based on information and model structure information; and each node information in the node information group sequence. Sample reconstruction for a group, based on model structure information corresponding to the node information group, training samples in a training sample set corresponding to the node information group to generate a target training sample set. means; a model training unit for model training the sub-model to generate a sub-training result for each sub-model of each of the sub-models via a target training sample set corresponding to the sub-model; is configured to generate a trained initial model based on current model parameter information corresponding to the submodel in each of the submodels in response to convergence of all of the plurality of subtraining results obtained. Model generation means.

第３の態様では、本開示のいくつかの実施形態は、１つまたは複数のプロセッサと、１つ
以上のプログラムが１つ以上のプロセッサによって実行され、１つ以上のプロセッサが上
記第１の態様のいずれかに記載の方法を実現するようにする１つ以上のプログラムが格納
されている記憶装置と、を含む。 In a third aspect, some embodiments of the present disclosure provide for one or more processors and one or more programs executed by the one or more processors, the one or more processors according to the first aspect above. and a storage device storing one or more programs for implementing the method described in any of the above.

第４の態様では、本開示のいくつかの実施形態は、プロセッサによって実行されるときに
上述の第１の態様のいずれかに記載の方法を実装するコンピュータプログラムを格納した
コンピュータ可読媒体を提供する。 In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium storing a computer program that, when executed by a processor, implements a method according to any of the first aspects above. .

本開示の上述した各実施形態には、本開示のいくつかの実施形態のディープラーニングに
基づく分散式異種データの処理方法により、分散中のノードのノード使用効率を向上させ
、モデルトレーニング効率を向上させることができるという有益な効果がある。 Each of the above-described embodiments of the present disclosure includes the deep learning-based distributed heterogeneous data processing method of some embodiments of the present disclosure to improve the node utilization efficiency of distributed nodes and improve model training efficiency. It has the beneficial effect of being able to

具体的には、ノードの使用効率とモデルのトレーニング効率が低い原因は、第一に、分散
式に含まれる各ノードのノード構成が異なることが多いことと、複数回トレーニングする
モデルのモデル構造が異なることが多いため、固定した分散式トレーニング方式を用いて
モデルをトレーニングすると、分散式におけるノードのノード使用効率が低下することが
よくある。第二に、モデルが複数のサブモデルを含み、モデル構造が複雑な場合、全体モ
デルを直接トレーニングし、モデルのトレーニング周期が長く、モデルのトレーニング効
率が低下する。これに基づいて、本開示のいくつかの実施形態のディープラーニングに基
づく分散式異種データの処理方法は、まず、初期モデルに含まれる各サブモデルのうちの
１つのサブモデルが構成されているノード情報グループシーケンスを決定することを含む
。実際には、ノードのノード構成が異なることが多いため、サブモデルのモデル構造が異
なることが多い。固定分散トレーニング方式を採用し、サブモデルを予め設定された固定
されたノードに構成させると、モデル構造とノードの構成が一致せず（例えば、モデル構
造が簡単なモデルが構成の高いノードに構成される）、ノードのノード使用率が低下する
ことがある。したがって、ノード情報グループのシーケンスを決定することによって、サ
ブモデルを対応するノードグループ（例えば、モデル構造が簡単なモデルを構成の低いノ
ードに構成する）に構成し、ノードのノード使用効率を高める。次に、前記ノード情報グ
ループのシーケンスにおける各ノード情報グループに対応するノード構成情報とモデル構
造情報に基づいて、前記ノード情報グループに対応するトレーニングサンプルの集合を決
定し、前記モデル構造情報はノード情報グループに対応するサブモデルのモデル構造を特
徴づける。実際には、異なるノード情報グループに対応するノードグループの構成が異な
ることが多く、異なるサブモデルの構造も異なるため、異なるノードグループのモデルト
レーニング速度も異なる。そのため、各ノードグループのサブモデルのトレーニング時間
を近似させ、トレーニング時間を短縮するために、ノード情報グループのノード構成情報
とサブモデルの構造情報に基づいて、ノード情報グループに必要なトレーニングサンプル
の数を決定する必要がある。そして、前記ノード情報グループシーケンス中の各ノード情
報グループに対して、前記ノード情報グループに対応するモデル構造情報に基づいて、前
記ノード情報グループに対応するトレーニングサンプル集合中のトレーニングサンプルを
サンプル再構成して、ターゲットトレーニングサンプル集合を生成する。実際には、サブ
モデルによってモデル入力が異なることがある。そのため、モデル構造情報に対応するサ
ブモデルのモデル入力に基づいて、トレーニングサンプルをサンプル再構築し、ターゲッ
トトレーニングサンプル集合を得る必要がある。次に、前記各サブモデルにおける各サブ
モデルについて、前記サブモデルに対応するターゲットトレーニングサンプル集合により
、前記サブモデルに対してモデルトレーニングを行い、サブトレーニング結果を生成する
。これにより、各サブモデルをそれぞれトレーニングすることにより、モデルトレーニン
グ周期を短縮し、モデルトレーニング効率を向上させる。最後に、決定された複数のサブ
トレーニング結果がすべて収束することに応答して、上記各サブモデルにおけるサブモデ
ルに対応する現在のモデルパラメータ情報に基づいて、トレーニングが完了した初期モデ
ルを生成する。これにより、分散式における各サブモデルのトレーニング完了で得られた
モデルパラメータ情報により、トレーニング完了の初期モデルを生成し、分散式における
ノードのノード使用効率を向上させるとともに、モデルトレーニング効率を向上させる。 Specifically, the causes of low node usage efficiency and model training efficiency are, firstly, that the node configuration of each node included in the distributed formula is often different, and the model structure of the model that is trained multiple times. Because they are often different, training a model using a fixed distributed training scheme often results in less efficient node usage of the nodes in the distributed scheme. Second, if the model includes multiple sub-models and the model structure is complex, the whole model will be trained directly, the model training cycle will be long, and the model training efficiency will be reduced. Based on this, the deep learning-based distributed heterogeneous data processing method of some embodiments of the present disclosure firstly includes a node where one submodel of each submodel included in the initial model is configured. including determining an information group sequence. In reality, the node configurations of the nodes are often different, so the model structures of the submodels are often different. If you adopt a fixed distributed training method and configure submodels on preset fixed nodes, the model structure and node configuration may not match (for example, a model with a simple model structure may be configured on a highly configured node). ), the node utilization of the node may decrease. Therefore, by determining the sequence of node information groups, sub-models are configured into corresponding node groups (for example, a model with a simple model structure is configured into a node with a low configuration), and the node usage efficiency of the nodes is increased. Next, a set of training samples corresponding to the node information group is determined based on the node configuration information and model structure information corresponding to each node information group in the sequence of node information groups, and the model structure information is the node information Characterize the model structure of the submodels corresponding to the groups. In practice, the configurations of node groups corresponding to different node information groups are often different, and the structures of different sub-models are also different, so the model training speeds of different node groups are also different. Therefore, in order to approximate the training time of submodels in each node group and reduce the training time, the number of training samples required for a node information group is determined based on the node configuration information of the node information group and the structure information of the submodels. need to be determined. Then, for each node information group in the node information group sequence, the training samples in the training sample set corresponding to the node information group are sample-reconstructed based on the model structure information corresponding to the node information group. Then, a target training sample set is generated. In reality, model inputs may differ depending on the submodel. Therefore, it is necessary to reconstruct the training samples based on the model input of the submodel corresponding to the model structure information to obtain a target training sample set. Next, for each submodel in each of the submodels, model training is performed on the submodel using a target training sample set corresponding to the submodel to generate a subtraining result. Thereby, by training each sub-model individually, the model training period is shortened and the model training efficiency is improved. Finally, in response to all of the determined sub-training results converging, an initial model for which training has been completed is generated based on the current model parameter information corresponding to the sub-model in each of the sub-models. As a result, a trained initial model is generated using the model parameter information obtained by completing the training of each sub-model in the distributed model, thereby improving the node usage efficiency of the nodes in the distributed model and improving the model training efficiency.

本開示によるディープラーニングに基づく分散式異種データの処理方法のいくつかの実施形態のフローチャートである。2 is a flowchart of several embodiments of a distributed heterogeneous data processing method based on deep learning according to the present disclosure. 本開示によるディープラーニングに基づく分散式異種データの処理装置のいくつかの実施形態の構成図である。1 is a configuration diagram of some embodiments of a distributed heterogeneous data processing apparatus based on deep learning according to the present disclosure. FIG. 図３は、本開示のいくつかの実施形態を実現するのに適した電子機器の構造概略図である。FIG. 3 is a structural schematic diagram of an electronic device suitable for implementing some embodiments of the present disclosure.

以下、図面を参照し、実施例に関連して本開示を詳細に説明する。
図１は、本開示によるディープラーニングに基づく分散式異種データの処理方法のいくつ
かの実施形態のフロー１００を示す。このディープラーニングに基づく分散式異種データ
の処理方法は、次のステップを含む： Hereinafter, the present disclosure will be described in detail in connection with embodiments with reference to the drawings.
FIG. 1 shows a flow 100 of some embodiments of a distributed heterogeneous data processing method based on deep learning according to the present disclosure. This deep learning-based distributed heterogeneous data processing method includes the following steps:

ステップ１０１では、ノード情報グループシーケンスを決定する。
いくつかの実施形態では、ディープラーニングに基づく分散式異種データの処理方法の実
行主体（例えば、計算装置）は、様々な方法でノード情報グループのシーケンスを決定す
ることができる。前記ノード情報グループシーケンスにおける各ノード情報グループに対
応するノードグループは、初期モデルに含まれる各サブモデルのうちの１つのサブモデル
を構成している。ノード情報はノードを特徴付けることができる。ノードは分散中の演算
ノードである。ノード情報は、ノード番号、ノード構成情報、サブモデル番号、およびノ
ードに配備されたモデルのモデル構造情報を含むことができる。ノード構成情報は、ノー
ドの構成を特徴付けることができる。例えば、ノード構成情報は、等級で特徴付けること
ができる。ノード構成情報が等級によって特徴づけられる場合、等級が高いほどノード構
成情報に対応するノードを特徴づける演算能力が高くなる。モデル構造情報は、ノード情
報に対応するサブモデルのモデル構造を特徴づける。モデル構造情報は、サブモデルのモ
デル複雑さを特徴づけることができる。例えば、モデル構造情報は、対応するサブモデル
のモデル階数によって量子化することができる。ノード情報グループにおける各ノード情
報に対応するノードの構成は同じである。
いくつかの実施形態のいくつかの代替的な実施形態では、上述の実行主体は、ノード情報
グループのシーケンスを決定するステップを含むことができる。 In step 101, a node information group sequence is determined.
In some embodiments, an entity (eg, a computing device) performing a deep learning-based distributed heterogeneous data processing method may determine the sequence of node information groups in various ways. The node group corresponding to each node information group in the node information group sequence constitutes one submodel among the submodels included in the initial model. Node information can characterize a node. The nodes are distributed computational nodes. The node information may include a node number, node configuration information, submodel number, and model structure information of a model deployed at the node. Node configuration information may characterize the configuration of a node. For example, node configuration information can be characterized by a grade. When the node configuration information is characterized by a grade, the higher the grade, the higher the computing power for characterizing the node corresponding to the node configuration information. The model structure information characterizes the model structure of the submodel corresponding to the node information. Model structure information can characterize the model complexity of the submodel. For example, model structure information can be quantized by the model rank of the corresponding submodel. The configuration of nodes corresponding to each piece of node information in a node information group is the same.
In some alternatives of some embodiments, the above-described execution entity may include determining a sequence of node information groups.

第１のステップでは、初期ノード情報シーケンスを取得する。
ここで、初期ノード情報は、ノード番号とノード構成情報とを含むことができる。ノード
構成情報は、ノードの構成を特徴付けることができる。ノード構成情報は、ノードに対応
するハードウェア構成による単位時間当たりの最大データ処理量によって特徴付けること
ができる。
第２のステップでは、前記各サブモデルにおけるサブモデルに対応する計算能力要求情報
を確定する。
そのうち、計算力需要情報はサブモデルの計算力需要を特徴づける。演算力需要は、サブ
モデルに含まれる各レイヤの最大データ処理量を量子化することによって得ることができ
る。たとえば、演算力要件はサブモデルの浮動小数点演算量であってもよい。なお、浮動
小数点演算数とは、単位時間当たりの浮動小数点演算器の浮動小数点演算量をいう。
第３ステップでは、前記各サブモデル中の各サブモデルに対して、前記サブモデルに対応
する計算能力要求情報に基づいて、前記初期ノード情報シーケンスから前記サブモデルに
対応する初期ノード情報サブ集合をノード情報グループとして選別し、ノード情報群とす
る。
ここで、サブモデルに対応する計算力需要情報によって特徴付けられる浮動小数点演算量
は、ノード情報グループにおけるノード情報に対応するノードの最大データ処理量と一致
する。 The first step is to obtain an initial node information sequence.
Here, the initial node information may include a node number and node configuration information. Node configuration information may characterize the configuration of a node. The node configuration information can be characterized by the maximum amount of data processing per unit time due to the hardware configuration corresponding to the node.
In the second step, calculation capacity requirement information corresponding to the submodel in each of the submodels is determined.
Among them, the computing power demand information characterizes the computing power demand of the submodel. The computational power demand can be obtained by quantizing the maximum amount of data processing for each layer included in the submodel. For example, the computational power requirement may be the amount of floating point computation for the submodel. Note that the number of floating point operations refers to the amount of floating point operations performed by a floating point arithmetic unit per unit time.
In the third step, for each sub-model in each sub-model, an initial node information subset corresponding to the sub-model is extracted from the initial node information sequence based on computing power requirement information corresponding to the sub-model. It is selected as a node information group and made into a node information group.
Here, the amount of floating point operations characterized by the computing power demand information corresponding to the submodel matches the maximum amount of data processing of the node corresponding to the node information in the node information group.

ステップ１０２では、
前記ノード情報グループシーケンスにおける各ノード情報グループに対応するノード構成
情報とモデル構造情報に基づいて、前記ノード情報グループに対応するトレーニングサン
プルの集合を確定する。
いくつかの実施形態では、前記実行主体は、前記ノード情報グループのシーケンス中の各
ノード情報グループに対応するノード構成情報とモデル構成情報に基づいて、前記ノード
情報グループに対応するトレーニングサンプルの集合を決定することができる。ここで、
トレーニングサンプル集合中のトレーニングサンプルは、ノード情報グループに対応する
サブモデルトレーニング時に必要なサンプルであってもよい。ここで、モデル構造情報は
、ノード情報群に対応するサブモデルのモデル構造を特徴づける。
オプションとして、ノード構成情報は、タスクスケジューラ構成情報、メモリ構成情報、
およびデータプロセッサ構成情報を含むことができる。メモリ構成情報は、メモリサイズ
情報とメモリ読み書き速度情報とを含むことができる。ここで、タスクスケジューラ構成
情報は、ノードの中央プロセッサのクロック周波数を含むことができる。データプロセッ
サ構成情報は、ノードのグラフィックスカードのコア周波数を含むことができる。メモリ
サイズ情報は、ノードのメモリのメモリサイズを含むことができる。メモリ読み書き速度
情報は、ノードのメモリの読み書き頻度を含むことができる。
オプションとして、前記実行主体は、前記ノード情報グループのシーケンス中の各ノード
情報グループに対応するノード構成情報とモデル構造情報に基づいて、前記ノード情報グ
ループに対応するトレーニングサンプルの集合を決定し、以下のステップを含むことがで
きる。 In step 102,
A set of training samples corresponding to the node information group is determined based on node configuration information and model structure information corresponding to each node information group in the node information group sequence.
In some embodiments, the execution entity generates a set of training samples corresponding to the node information group based on node configuration information and model configuration information corresponding to each node information group in the sequence of node information groups. can be determined. here,
The training samples in the training sample set may be samples required during submodel training corresponding to the node information group. Here, the model structure information characterizes the model structure of the submodel corresponding to the node information group.
Optionally, node configuration information includes task scheduler configuration information, memory configuration information,
and data processor configuration information. The memory configuration information may include memory size information and memory read/write speed information. Here, the task scheduler configuration information may include the clock frequency of the node's central processor. The data processor configuration information may include the core frequency of the node's graphics card. The memory size information may include the memory size of the node's memory. The memory read/write speed information may include the read/write frequency of the node's memory.
Optionally, the execution entity determines a set of training samples corresponding to the node information group based on node configuration information and model structure information corresponding to each node information group in the sequence of node information groups, and performs the following: The steps may include:

第１に、前記ノード情報グループに対応するノード構成情報に含まれるタスクスケジュー
ラ構成情報に基づいて、前記ノード情報グループに対応するノードグループのタスクスケ
ジューリング能力情報を確定する。
ここで、タスクスケジューリング能力情報は、対応するノードグループ内のノードの中央
プロセッサのスケジューリング能力のサイズを特徴づけることができる。中央プロセッサ
のスケジューリング能力の大きさは、等級量子化により得ることができる。タスクスケジ
ューリング能力情報は、タスクスケジューリング能力等級を含むことができる。例えば、
タスクスケジューリング能力等級の値は、［１,１００］間の整数であってもよい。一例
として、タスクスケジューラ構成情報に含まれるノードの中央プロセッサのクロック周波
数が３．８ＧＨｚから４ＧＨｚの間にある場合、対応するタスクスケジューリング能力情
報に含まれるタスクスケジューリング能力等級は１０である。
第２に、前記ノード情報グループに対応するノード構成情報に含まれるメモリ構成情報に
含まれるメモリサイズ情報とメモリ読書き速度情報に基づいて、前記ノード情報グループ
に対応するノードグループのメモリキャッシュイング能力情報を確定する。
ここで、メモリキャッシュ能力情報は、対応するノードグループ内のノードのメモリのキ
ャッシュ能力の大きさを特徴づけることができる。メモリのキャッシュ能力の大きさは、
等級量子化により得ることができる。メモリキャッシュ能力情報は、メモリキャッシュ能
力等級を含むことができる。例えば、メモリキャッシュ能力等級の値は、［１,１００］
間の整数であってもよい。
一例として、メモリ構成情報に含まれるノードのメモリのメモリが１６ＧＢ以上であり、
メモリの読み書き周波数が４０００ＭＨｚ以上である場合、対応するメモリキャッシュ能
力情報に含まれるメモリキャッシュ能力等級は５である。
第３に、前記ノード情報グループに対応するノード構成情報に含まれるデータプロセッサ
構成情報に基づいて、前記ノード情報グループに対応するノードグループのデータプロセ
ッシング能力情報を確定する。
ここで、データ処理能力情報は、対応するノード群におけるノードのグラフィックスカー
ドのデータ処理能力の大きさを特徴づけることができる。グラフィックスカードのデータ
処理能力の大きさは、等級量子化により得ることができる。データ処理能力情報は、デー
タ処理能力等級を含むことができる。例えば、データ処理能力等級の値は、［１,１００
］間の整数であってもよい。
一例として、データプロセッサ構成情報に含まれるノードのグラフィックスカードのコア
周波数が３５０Ｍｈｚから４００Ｍｈｚの間にある場合、対応するデータ処理能力情報に
含まれるデータ処理能力等級は５である。
第４では、前記タスクスケジューリング能力情報、前記メモリキャッシュイング能力情報
、前記データプロセッシング能力情報および前記モデル構造情報に基づいて、前記ノード
情報グループに対応するノードグループのトレーニングサンプル要求情報を確定する。
ここで、トレーニングサンプル需要情報は、ノード情報グループに対応するサブモデルの
トレーニングを行うために使用されるトレーニングサンプルの需要比率を特徴付けること
ができる。需要比率は、サブモデルに必要なトレーニングサンプルがすべてのトレーニン
グサンプルに占める割合です。
一例として、まず、前記実行主体は、前記タスクスケジューリング能力等級、メモリキャ
ッシュ能力等級、データ処理能力等級及び前記モデル構造情報に含まれるモデル階層数を
合計して、ノード情報グループに対応する総等級を決定することができる。そして、前記
実行主体は、前記ノード情報グループシーケンスにおける各ノード情報グループに対応す
る総等級を正規化処理し、各ノード情報グループに対応する重みを得ることができる。最
後に、前記実行主体は、前記各ノード情報グループに対応する重みを、前記各情報グルー
プに対応するサブモデルのトレーニングを行うトレーニングサンプルの需要比率として特
定し、各ノード情報グループに対応するトレーニングサンプル需要情報を得ることができ
る。
ステップ５では、前記トレーニングサンプル要求情報に基づいて、前記ノード情報グルー
プに対応するトレーニングサンプル集合を確定する。
たとえば、すべてのトレーニングサンプルの数は１００です。サブモデルのトレーニング
サンプルの需要比率は１０％であり、サブモデルのトレーニングに使用されるトレーニン
グサンプルの数は１０である。ここで、前記実行主体は、サブモデルに対応するノード情
報群に対応するトレーニングサンプルの集合として、すべてのトレーニングサンプルから
ランダムに１０個のトレーニングサンプルを抽出することができる。
ステップ１０３では、ノード情報グループシーケンス中の各ノード情報グループに対して
、ノード情報グループに対応するモデル構造情報に基づいて、ノード情報グループに対応
するトレーニングサンプル集合中のトレーニングサンプルをサンプル再構築して、ターゲ
ットトレーニングサンプル集合を生成する。
いくつかの実施形態では、ノード情報グループシーケンス中の各ノード情報グループにつ
いて、前記実行主体は、前記ノード情報グループに対応するモデル構造情報に基づいて、
前記ノード情報グループに対応するトレーニングサンプル集合中のトレーニングサンプル
をサンプル再構築して、ターゲットトレーニングサンプル集合を生成することができる。
ここで、上記ターゲットトレーニングサンプルの集合におけるターゲットトレーニングサ
ンプルは、対応するサブモデルをトレーニングするためのサンプルであってもよい。
一例として、上記実行主体は、モデル構造情報に対応するサブモデルのモデル入力に基づ
いて、ノード情報グループに対応するトレーニングサンプル集合中のトレーニングサンプ
ルをサンプル再構築して、ターゲットトレーニングサンプル集合を生成することができる
。たとえば、サブモデルのモデル入力のサイズは３００＊３００である。上記実行主体は
、生成されたターゲットトレーニングサンプル中の画像のサイズが３００＊３００である
ようにトレーニングサンプル中の画像をトリミングすることができる。別の例として、サ
ブモデルのモデル入力のサイズは２０＊２０である。上記実行主体は、生成されたターゲ
ットトレーニングサンプル中の画像のサイズが２０＊２０であるようにトレーニングサン
プル中の画像をトリミングすることができる。
ステップ１０４では、各サブモデルの各サブモデルについて、サブモデルに対応するター
ゲットトレーニングサンプル集合を通じて、サブモデルをモデルトレーニングして、サブ
トレーニング結果を生成する。 First, task scheduling capability information of a node group corresponding to the node information group is determined based on task scheduler configuration information included in node configuration information corresponding to the node information group.
Here, the task scheduling capability information may characterize the size of the scheduling capability of the central processors of the nodes in the corresponding node group. The magnitude of the central processor's scheduling capability can be obtained by grade quantization. The task scheduling ability information may include a task scheduling ability grade. for example,
The value of the task scheduling ability grade may be an integer between [1,100]. As an example, if the clock frequency of the central processor of the node included in the task scheduler configuration information is between 3.8 GHz and 4 GHz, the task scheduling capability grade included in the corresponding task scheduling capability information is 10.
Second, the memory caching capacity of the node group corresponding to the node information group is determined based on memory size information and memory read/write speed information included in memory configuration information included in the node configuration information corresponding to the node information group. Confirm information.
Here, the memory cache capability information can characterize the size of the memory cache capability of the nodes in the corresponding node group. The size of memory cache capacity is
It can be obtained by magnitude quantization. The memory cache capability information may include a memory cache capability grade. For example, the value of memory cache capacity grade is [1,100]
It may be an integer between.
As an example, if the memory of the node included in the memory configuration information is 16 GB or more,
When the read/write frequency of the memory is 4000 MHz or higher, the memory cache capability grade included in the corresponding memory cache capability information is 5.
Third, data processing capability information of a node group corresponding to the node information group is determined based on data processor configuration information included in node configuration information corresponding to the node information group.
Here, the data processing capacity information can characterize the data processing capacity of the graphics card of the node in the corresponding node group. The amount of data processing power of a graphics card can be obtained through grade quantization. The data processing capability information may include a data processing capability grade. For example, the value of the data processing capacity grade is [1,100
] may be an integer between.
As an example, if the core frequency of the graphics card of the node included in the data processor configuration information is between 350 Mhz and 400 Mhz, the data processing capability grade included in the corresponding data processing capability information is 5.
Fourth, determining training sample request information for a node group corresponding to the node information group based on the task scheduling capability information, the memory caching capability information, the data processing capability information, and the model structure information.
Here, the training sample demand information may characterize the demand ratio of training samples used to train the submodel corresponding to the node information group. The demand ratio is the proportion of training samples required by a submodel to all training samples.
As an example, first, the execution entity sums up the task scheduling ability grade, memory cache ability grade, data processing ability grade, and the number of model layers included in the model structure information, and calculates the total grade corresponding to the node information group. can be determined. Then, the execution entity can normalize the total grade corresponding to each node information group in the node information group sequence to obtain a weight corresponding to each node information group. Finally, the execution entity specifies the weight corresponding to each node information group as a demand ratio of training samples for training the submodel corresponding to each information group, and Demand information can be obtained.
In step 5, a training sample set corresponding to the node information group is determined based on the training sample request information.
For example, the number of all training samples is 100. The demand ratio of training samples for the submodel is 10%, and the number of training samples used for training the submodel is 10. Here, the execution entity can randomly extract 10 training samples from all the training samples as a set of training samples corresponding to the node information group corresponding to the sub-model.
In step 103, for each node information group in the node information group sequence, the training samples in the training sample set corresponding to the node information group are reconstructed based on the model structure information corresponding to the node information group. , generate a target training sample set.
In some embodiments, for each node information group in the node information group sequence, the execution entity, based on model structure information corresponding to the node information group,
The training samples in the training sample set corresponding to the node information group may be reconstructed to generate a target training sample set.
Here, the target training samples in the set of target training samples may be samples for training a corresponding sub-model.
As an example, the execution entity generates a target training sample set by reconstructing the training samples in the training sample set corresponding to the node information group based on the model input of the submodel corresponding to the model structure information. be able to. For example, the size of the model input for the submodel is 300*300. The execution entity may crop the image in the training sample so that the size of the image in the generated target training sample is 300*300. As another example, the size of the model input for the submodel is 20*20. The execution entity may crop the image in the training sample so that the size of the image in the generated target training sample is 20*20.
In step 104, for each submodel of each submodel, model training is performed on the submodel through a target training sample set corresponding to the submodel to generate a subtraining result.

いくつかの実施形態では、上記各サブモデルの各サブモデルについて、上記実行主体は、
上記サブモデルに対応するターゲットトレーニングサンプルの集合を通じて、上記サブモ
デルをモデルトレーニングして、サブトレーニング結果を生成することができる。ここで
、サブトレーニング結果は、対応するサブモデルの損失値を特徴付けることができる。
あるいは、上記の各サブモデルは、顔認識モデル、人種認識モデル、姿勢認識モデルを含
むことができる。なお、上記顔認識モデルは、画像中のユーザの顔を認識するためのモデ
ルであってもよい。上記人種識別モデルは、画像に対応するユーザの人種を識別するため
のモデルであってもよい。上記姿勢認識モデルは、画像中のユーザ姿勢を認識するための
モデルであってもよい。
一例として、上述の顔認識モデルは、ＭＴＣＮ（Ｍｕｌｔｉｔａｓｋｃｏｎｖｏｌｕ
ｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋ）モデルであってもよい。上記姿勢認識モ
デルは、ＯｐｅｎＰｏｓｅ（開放姿勢）モデルであってもよい。上記人種識別モデルは、
２つの第１人種識別ネットワークと、２つの第２人種識別ネットワークと、１つの人種分
類ネットワークとを含むことができる。第１の人種識別ネットワークは、２つの３＊３、
ステップサイズ１の畳み込み層と、１つの２＊２、ステップサイズ１の最大プール化層と
を含む。第２の人種識別ネットワークは、３つの３×３、ステップサイズ１の畳み込み層
と、１つの２×２、ステップサイズ１の最大プール化層とを含む。人種分類ネットワーク
は、１つの２０４８個のニューロンの全接続層、１つの１０２４個のニューロンの全接続
層、および人種分類器を含むことができる。人種分類器は、ｓｏｆｔｍａｘ分類器であっ
てもよい。 In some embodiments, for each sub-model of each of the sub-models, the execution entity:
The sub-model may be model-trained to generate a sub-training result through a set of target training samples corresponding to the sub-model. Here, the sub-training result can characterize the loss value of the corresponding sub-model.
Alternatively, each of the above sub-models may include a face recognition model, a race recognition model, and a posture recognition model. Note that the face recognition model may be a model for recognizing a user's face in an image. The race identification model may be a model for identifying the race of the user corresponding to the image. The posture recognition model may be a model for recognizing a user posture in an image.
As an example, the above-mentioned face recognition model is a multi task convolution model (MTCN).
tional neural network) model. The posture recognition model may be an OpenPose model. The above racial identification model is
It can include two first racial identification networks, two second racial identification networks, and one racial classification network. The first racial identification network consists of two 3*3,
It includes a convolutional layer with a step size of 1 and one 2*2, max pooling layer with a step size of 1. The second race identification network includes three 3x3, step size 1 convolutional layers and one 2x2, step size 1 max pooling layer. The racial classification network may include one fully connected layer of 2048 neurons, one fully connected layer of 1024 neurons, and a racial classifier. The racial classifier may be a softmax classifier.

ある実施形態において、各サブモデルにおけるサブモデルに対して、サブモデルに対応す
る目的トレーニングサンプルの集合により、サブモデルに対して実行主体がモデルトレー
ニングを行うことにより、サブトレーニング結果を生成する。以下のステップを含むこと
ができる。
第１の手順：確定されたサブモデルは顔認識モデルであることに応答して記顔認識モデル
に対応する目的トレーニングサンプル集合により、顔認識モデルに対してモデルトレーニ
ングを行うことにより、顔認識モデルに対応するサブトレーニング結果を生成する。
例えば、まず、実行主体は、顔認識モデルに対応する目的トレーニングサンプル集合にお
ける目的トレーニングサンプルを、顔認識モデルに順次入力することにより、顔認識モデ
ルに対してモデルトレーニングを行う。そして、実行主体は、予定された損失関数によっ
て、顔認識モデルに対応する損失値を確定する。ここで、予定された損失関数は、平均二
乗誤差損失関数としてもよい。
第２の手順：確定されたサブモデルは人種認識モデルであることに応答して、人種認識モ
デルに対応する目的トレーニングサンプル集合により、前記人種認識モデルに対してモデ
ルトレーニングを行うことにより、前記人種認識モデルに対応するサブトレーニング結果
を生成する。
例えば、まず、実行主体は、人種認識モデルに対応する目的トレーニングサンプル集合に
おける目的トレーニングサンプルを、人種認識モデルに順次入力することにより、人種認
識モデルに対してモデルトレーニングを行う。そして、実行主体は、予定された損失関数
によって、人種認識モデルに対応する損失値を確定する。ここで、予定された損失関数は
、平均二乗誤差損失関数としてもよい。
第３の手順：確定されたサブモデルは姿勢認識モデルであることに応答して、姿勢認識モ
デルに対応する目的トレーニングサンプル集合により、姿勢認識モデルに対してモデルト
レーニングを行うことにより、姿勢認識モデルに対応するサブトレーニング結果を生成す
る。
例えば、まず、実行主体は、姿勢認識モデルに対応する目的トレーニングサンプル集合に
おける目的トレーニングサンプルを、姿勢認識モデルに順次入力することにより、姿勢認
識モデルに対してモデルトレーニングを行う。そして、実行主体は、予定された損失関数
によって、姿勢認識モデルに対応する損失値を確定する。ここで、予定された損失関数は
、平均二乗誤差損失関数としてもよい。
ステップ１０５：確定された複数のサブトレーニング結果がすべて収束することに応答し
て、各サブモデルにおけるサブモデルに対応する現在モデルパラメータ情報に基づいて、
トレーニングが完了した初期モデルを生成する。
ある実施形態において、実行主体は、確定された複数のサブトレーニング結果がすべて収
束することに応答して、各サブモデルにおけるサブモデルに対応する現在モデルパラメー
タ情報に基づいて、トレーニングが完了した初期モデルを生成する。ここで、現在モデル
パラメータ情報は、サブトレーニング結果が収束した時における対応するサブモデルの重
みパラメータの情報としてもよい。
ある実施形態において、実行主体は、確定された複数のサブトレーニング結果がすべて収
束することに応答して、各サブモデルにおけるサブモデルに対応する現在モデルパラメー
タ情報に基づいて、トレーニングが完了した初期モデルを生成することは、以下のステッ
プを含むことができる。 In one embodiment, a sub-training result is generated by an execution entity performing model training on a sub-model in each sub-model using a set of target training samples corresponding to the sub-model. It may include the following steps.
First step: In response to the fact that the determined sub-model is a face recognition model, model training is performed on the face recognition model using the target training sample set corresponding to the written face recognition model. Generate sub-training results corresponding to .
For example, first, the execution entity performs model training on the face recognition model by sequentially inputting target training samples in the target training sample set corresponding to the face recognition model to the face recognition model. Then, the execution entity determines a loss value corresponding to the face recognition model using a predetermined loss function. Here, the predetermined loss function may be a mean squared error loss function.
Second step: in response to the determined sub-model being a race-aware model, by performing model training on the race-aware model with a target training sample set corresponding to the race-aware model. , generate sub-training results corresponding to the race recognition model.
For example, first, the execution entity performs model training on the race recognition model by sequentially inputting target training samples in the target training sample set corresponding to the race recognition model to the race recognition model. Then, the execution entity determines a loss value corresponding to the race recognition model using a predetermined loss function. Here, the predetermined loss function may be a mean squared error loss function.
Third step: In response to the fact that the determined sub-model is a posture recognition model, the posture recognition model is created by performing model training on the posture recognition model using the target training sample set corresponding to the posture recognition model. Generate sub-training results corresponding to .
For example, first, the execution entity performs model training on the posture recognition model by sequentially inputting target training samples in the target training sample set corresponding to the posture recognition model to the posture recognition model. Then, the execution entity determines a loss value corresponding to the posture recognition model using a predetermined loss function. Here, the predetermined loss function may be a mean squared error loss function.
Step 105: In response to all the determined sub-training results converging, based on the current model parameter information corresponding to the sub-model in each sub-model,
Generate a trained initial model.
In some embodiments, in response to all of the determined sub-training results converging, the execution entity determines the trained initial model based on the current model parameter information corresponding to the sub-model in each sub-model. generate. Here, the current model parameter information may be information on the weight parameters of the corresponding sub-model when the sub-training results converge.
In some embodiments, in response to all of the determined sub-training results converging, the execution entity determines the trained initial model based on the current model parameter information corresponding to the sub-model in each sub-model. Generating may include the following steps.

第１の手順：確定された顔認識モデルに対応するサブトレーニング結果が収束することに
応答して、顔認識モデルの現在モデルパラメータを確定することにより、顔認識モデルに
対応する現在モデルパラメータ情報を生成する。
例えば、実行主体は、確定された顔認識モデルに対応する損失値が収束することに応答し
て、逆伝搬により顔認識モデルの重みパラメータを得て、現在モデルパラメータ情報を得
る。
第２の手順：確定された人種認識モデルに対応するサブトレーニング結果が収束すること
に応答して、人種顔認識モデルの現在モデルパラメータを確定することにり、人種認識モ
デルに対応する現在モデルパラメータ情報を生成する。
例えば、実行主体は、確定された人種認識モデルに対応する損失値が収束することに応答
して、逆伝搬により人種認識モデルの重みパラメータを得て、現在モデルパラメータ情報
を得る。
第３の手順：確定された姿勢認識モデルに対応するサブトレーニング結果が収束すること
に応答して、姿勢顔認識モデルの現在モデルパラメータを確定することにり、姿勢認識モ
デルに対応する現在モデルパラメータ情報を生成する。
例えば、実行主体は、確定された姿勢認識モデルに対応する損失値が収束することに応答
して、逆伝搬により姿勢認識モデルの重みパラメータを得て、現在モデルパラメータ情報
を得る。
第４の手順：得られた複数の現在モデルパラメータ情報に基づいて、初期モデルに対して
モデルパラメータの更新を行うことにより、候補初期モデルを生成する。
ここで、候補初期モデルは、初期モデルにおけるパラメータを置換したモデルであっても
よい。実行主体は、得られた複数の現在モデルパラメータ情報におけるモデルパラメータ
情報に含まれる重みパラメータに基づいて、初期モデルのパラメータを置換することによ
り、候補初期モデルを生成する。
第５の手順：各サブモデルに対応する複数のトレーニングサンプル集合からトレーニング
サンプルを抽出することにより、目的サンプルを生成し、目的サンプル集合を得る。
ここで、目的サンプルは、複数のトレーニングサンプル集合からランダムに抽出されたも
のであってもよく、候補初期モデルをトレーニングするために用いられる。
第６の手順：目的サンプル集合により候補初期モデルに対してモデルトレーニングする。
例えば、実行主体は、目標サンプルの集合における目標サンプルを、順次候補初期モデル
に入力することにより、候補初期モデルをトレーニングすることができる。
第７の手順：確定された候補初期モデルに対応する目的トレーニング結果が収束すること
に応答して、トレーニングが完了した初期モデルを生成する。
例えば、実行主体は、確定された候補初期モデルに対応する目的トレーニング結果が収束
することに応答して、現在トレーニングによるパラメータを初期モデルのパラメータとし
て確定し、トレーニングが完了した初期モデルを生成することができる。 First step: In response to the convergence of the sub-training results corresponding to the determined face recognition model, the current model parameter information corresponding to the face recognition model is determined by determining the current model parameters of the face recognition model. generate.
For example, the execution entity obtains the weight parameters of the face recognition model by back propagation in response to the convergence of the loss value corresponding to the determined face recognition model, and obtains the current model parameter information.
Second step: responding to the racial recognition model by determining the current model parameters of the racial face recognition model in response to the convergence of the sub-training results corresponding to the determined racial recognition model. Generate current model parameter information.
For example, in response to convergence of loss values corresponding to the determined race recognition model, the execution entity obtains weight parameters of the race recognition model through back propagation to obtain current model parameter information.
Third step: determining the current model parameters of the posture face recognition model in response to the convergence of the sub-training results corresponding to the determined posture recognition model; Generate information.
For example, in response to the convergence of the loss value corresponding to the determined posture recognition model, the execution entity obtains the weight parameters of the posture recognition model by back propagation, and obtains current model parameter information.
Fourth step: A candidate initial model is generated by updating the model parameters of the initial model based on the obtained plurality of current model parameter information.
Here, the candidate initial model may be a model in which parameters in the initial model are replaced. The execution entity generates a candidate initial model by replacing the parameters of the initial model based on the weight parameters included in the model parameter information in the obtained plurality of current model parameter information.
Fifth step: Generate target samples by extracting training samples from a plurality of training sample sets corresponding to each submodel, and obtain the target sample set.
Here, the target samples may be randomly extracted from a plurality of training sample sets and are used to train the candidate initial model.
Sixth step: Model training is performed on the candidate initial model using the target sample set.
For example, an execution entity can train a candidate initial model by sequentially inputting target samples in a set of target samples into the candidate initial model.
Seventh step: Generate a trained initial model in response to convergence of the target training results corresponding to the determined candidate initial model.
For example, in response to the convergence of the target training results corresponding to the determined candidate initial model, the execution entity determines the parameters of the current training as the parameters of the initial model, and generates the initial model for which training has been completed. Can be done.

選択的には、実行主体は、さらに以下の処理手順を実行することができる。
第１の手順：目的画像を取得する。
ここで、目的画像は目的対象を含む。目的対象は、ユーザの全身画像であってもよい。
例えば、目的画像は、銀行の監視装置に撮影された画像であってもよい。監視装置はカメ
ラであってもよい。
第２の手順：目的画像をトレーニングが完了した初期モデルに入力することにより、目的
画像に対応するユーザアイデンティティ情報を生成する。
ここで、ユーザアイデンティティ情報は、目的画像に含まれる目的対象に対応するユーザ
のユーザアイデンティティ情報であってもよい。ユーザアイデンティティ情報は、顔位置
情報、人種情報、およびユーザ姿勢情報を含む。顔位置情報は、目的画像に含まれる目的
対象における顔の位置する位置を発現することができる。人種情報は、目的画像に含まれ
る目的対象に対応するユーザの人種カテゴリを発現することができる。ユーザ姿勢情報は
、目的画像に含まれる目的対象における四肢の位置する位置を発現することができる。
第３の手順：ユーザアイデンティティ情報に含まれる顔位置情報、人種情報、およびユー
ザ姿勢情報に基づいて、目的画像に画像マーキングを行うことにより、マーキング済みの
目的画像を生成する。
例えば、実行主体は、顔位置情報における顔位置、人種情報における人種カテゴリ、およ
びユーザ姿勢情報における肢体位置に基づいて、目的画像における対応位置をマークする
ことができる。
第３の手順：マーキング済みの目的画像に対して画像記憶を行う。
たとえば、実行主体は、マーキング済み画像を磁気ディスクに格納することができる。
本開示のいくつかの実施形態のディープラーニングに基づく分散式異種データの処理方法
により、分散中のノードのノード使用効率を向上させ、モデルトレーニング効率を向上さ
せることができる。具体的には、ノードの使用効率とモデルのトレーニング効率が低い原
因は、第一に、分散式に含まれる各ノードのノード構成が異なることが多いことと、複数
回トレーニングするモデルのモデル構造が異なることが多いため、固定した分散式トレー
ニング方式を用いてモデルをトレーニングすると、分散式におけるノードのノード使用効
率が低下することがよくある。第二に、モデルが複数のサブモデルを含み、モデル構造が
複雑な場合、全体モデルを直接トレーニングし、モデルのトレーニング周期が長く、モデ
ルのトレーニング効率が低下する。これに基づいて、本開示のいくつかの実施形態のディ
ープラーニングに基づく分散式異種データの処理方法は、まず、初期モデルに含まれる各
サブモデルのうちの１つのサブモデルが構成されているノード情報グループシーケンスを
決定することを含む。実際には、ノードのノード構成が異なることが多いため、サブモデ
ルのモデル構造が異なることが多い。固定分散トレーニング方式を採用し、サブモデルを
予め設定された固定されたノードに構成させると、モデル構造とノードの構成が一致せず
（例えば、モデル構造が簡単なモデルが構成の高いノードに構成される）、ノードのノー
ド使用率が低下することがある。したがって、ノード情報グループのシーケンスを決定す
ることによって、サブモデルを対応するノードグループ（例えば、モデル構造が簡単なモ
デルを構成の低いノードに構成する）に構成し、ノードのノード使用効率を高める。次に
、前記ノード情報グループのシーケンスにおける各ノード情報グループに対応するノード
構成情報とモデル構造情報に基づいて、前記ノード情報グループに対応するトレーニング
サンプルの集合を決定し、前記モデル構造情報はノード情報グループに対応するサブモデ
ルのモデル構造を特徴づける。実際には、異なるノード情報グループに対応するノードグ
ループの構成が異なることが多く、異なるサブモデルの構造も異なるため、異なるノード
グループのモデルトレーニング速度も異なる。そのため、各ノードグループのサブモデル
のトレーニング時間を近似させ、トレーニング時間を短縮するために、ノード情報グルー
プのノード構成情報とサブモデルの構造情報に基づいて、ノード情報グループに必要なト
レーニングサンプルの数を決定する必要がある。そして、前記ノード情報グループシーケ
ンス中の各ノード情報グループに対して、前記ノード情報グループに対応するモデル構造
情報に基づいて、前記ノード情報グループに対応するトレーニングサンプル集合中のトレ
ーニングサンプルをサンプル再構成して、ターゲットトレーニングサンプル集合を生成す
る。実際には、サブモデルによってモデル入力が異なることがあります。そのため、モデ
ル構造情報に対応するサブモデルのモデル入力に基づいて、トレーニングサンプルをサン
プル再構築し、ターゲットトレーニングサンプル集合を得る必要がある。次に、前記各サ
ブモデルにおける各サブモデルについて、前記サブモデルに対応するターゲットトレーニ
ングサンプル集合により、前記サブモデルに対してモデルトレーニングを行い、サブトレ
ーニング結果を生成する。これにより、各サブモデルをそれぞれトレーニングすることに
より、モデルトレーニング周期を短縮し、モデルトレーニング効率を向上させる。最後に
、決定された複数のサブトレーニング結果がすべて収束することに応答して、上記各サブ
モデルにおけるサブモデルに対応する現在のモデルパラメータ情報に基づいて、トレーニ
ングが完了した初期モデルを生成する。これにより、分散式における各サブモデルのトレ
ーニング完了で得られたモデルパラメータ情報により、トレーニング完了の初期モデルを
生成し、分散式におけるノードのノード使用効率を向上させるとともに、モデルトレーニ
ング効率を向上させる。 Optionally, the execution entity may further perform the following processing steps.
First step: Obtain the target image.
Here, the target image includes the target object. The target object may be a full body image of the user.
For example, the target image may be an image captured by a bank's monitoring device. The monitoring device may be a camera.
Second step: Generate user identity information corresponding to the target image by inputting the target image into the trained initial model.
Here, the user identity information may be user identity information of a user corresponding to the target object included in the target image. User identity information includes face position information, race information, and user posture information. The face position information can express the position of the face in the target object included in the target image. The racial information may represent a user's racial category corresponding to the target included in the target image. The user posture information can express the positions of the limbs in the target object included in the target image.
Third step: A marked target image is generated by performing image marking on the target image based on face position information, race information, and user posture information included in the user identity information.
For example, the execution entity can mark the corresponding position in the target image based on the face position in the face position information, the racial category in the racial information, and the limb position in the user posture information.
Third step: Image storage is performed for the marked target image.
For example, an execution entity may store marked images on a magnetic disk.
The deep learning-based distributed heterogeneous data processing method of some embodiments of the present disclosure can improve node utilization efficiency of distributed nodes and improve model training efficiency. Specifically, the causes of low node usage efficiency and model training efficiency are, firstly, that the node configuration of each node included in the distributed formula is often different, and the model structure of the model that is trained multiple times. Because they are often different, training a model using a fixed distributed training scheme often results in less efficient node usage of the nodes in the distributed scheme. Second, if the model includes multiple sub-models and the model structure is complex, the whole model will be trained directly, the model training cycle will be long, and the model training efficiency will be reduced. Based on this, the deep learning-based distributed heterogeneous data processing method of some embodiments of the present disclosure firstly includes a node where one submodel of each submodel included in the initial model is configured. including determining an information group sequence. In reality, the node configurations of the nodes are often different, so the model structures of the submodels are often different. If you adopt a fixed distributed training method and configure submodels on preset fixed nodes, the model structure and node configuration may not match (for example, a model with a simple model structure may be configured on a highly configured node). ), the node utilization of the node may decrease. Therefore, by determining the sequence of node information groups, sub-models are configured into corresponding node groups (for example, a model with a simple model structure is configured into a node with a low configuration), and the node usage efficiency of the nodes is increased. Next, a set of training samples corresponding to the node information group is determined based on the node configuration information and model structure information corresponding to each node information group in the sequence of node information groups, and the model structure information is the node information Characterize the model structure of the submodels corresponding to the groups. In practice, the configurations of node groups corresponding to different node information groups are often different, and the structures of different sub-models are also different, so the model training speeds of different node groups are also different. Therefore, in order to approximate the training time of submodels in each node group and reduce the training time, the number of training samples required for a node information group is determined based on the node configuration information of the node information group and the structure information of the submodels. need to be determined. Then, for each node information group in the node information group sequence, the training samples in the training sample set corresponding to the node information group are sample-reconstructed based on the model structure information corresponding to the node information group. Then, a target training sample set is generated. In reality, model inputs may differ between submodels. Therefore, it is necessary to reconstruct the training samples based on the model input of the submodel corresponding to the model structure information to obtain a target training sample set. Next, for each submodel in each of the submodels, model training is performed on the submodel using a target training sample set corresponding to the submodel to generate a subtraining result. Thereby, by training each sub-model individually, the model training period is shortened and the model training efficiency is improved. Finally, in response to all of the determined sub-training results converging, an initial model for which training has been completed is generated based on the current model parameter information corresponding to the sub-model in each of the sub-models. As a result, a trained initial model is generated using the model parameter information obtained by completing the training of each sub-model in the distributed model, thereby improving the node usage efficiency of the nodes in the distributed model and improving the model training efficiency.

さらに図２を参照すると、上述した各図に示された方法の実現として、本開示は、図１に
示された方法の実施形態に対応し、具体的には様々な電子機器に適用可能な、ディープラ
ーニングに基づく分散式異種データの処理装置のいくつかの実施形態を提供する。
図２に示すように、いくつかの実施形態のディープラーニングに基づく分散式異種データ
の処理装置２００は、情報判定部２０１、サンプル判定部２０２、サンプル再構成部２０
３、モデルトレーニング部２０４、モデル生成部２０５を含む。ここで、情報判定部２０
１は、ノード情報グループのシーケンスを判定するように構成され、サンプル決定部２０
２は、上記ノード情報グループのシーケンスにおける各ノード情報グループに対応するノ
ード構成情報とモデル構造情報に基づいて、上記ノード情報グループに対応するサブモデ
ルのモデル構造を特徴づけるモデル構造情報グループに対応するトレーニングサンプルの
集合を決定するように構成され、サンプル再構成部２０３は、前記ノード情報グループシ
ーケンス中の各ノード情報グループに対して、前記ノード情報グループに対応するモデル
構造情報に基づいて、前記ノード情報グループに対応するトレーニングサンプル集合中の
トレーニングサンプルをサンプル再構成して、ターゲットトレーニングサンプル集合を生
成するように構成され、モデルトレーニングユニット２０４は、上記各サブモデルの各サ
ブモデルに対して、上記サブモデルに対応するターゲットトレーニングサンプルの集合を
通じて、上記サブモデルをモデルトレーニングして、サブトレーニング結果を生成するよ
うに構成され、モデル生成部２０５は、決定された複数のサブモデルのトレーニング結果
がすべて収束することに応答して、上記各サブモデルにおけるサブモデルに対応する現在
のモデルパラメータ情報に基づいて、トレーニングが完了した初期モデルを生成するよう
に構成される。 Further referring to FIG. 2, as an implementation of the method shown in each of the above-mentioned figures, the present disclosure corresponds to the embodiment of the method shown in FIG. 1 and is specifically applicable to various electronic devices. , provides several embodiments of a distributed heterogeneous data processing apparatus based on deep learning.
As shown in FIG. 2, a distributed heterogeneous data processing apparatus 200 based on deep learning according to some embodiments includes an information determination unit 201, a sample determination unit 202, a sample reconstruction unit 20
3. includes a model training section 204 and a model generation section 205. Here, the information determination unit 20
1 is configured to determine a sequence of node information groups, and the sample determination unit 20
2 corresponds to a model structure information group that characterizes the model structure of a submodel corresponding to the node information group based on the node configuration information and model structure information corresponding to each node information group in the sequence of node information groups. The sample reconstruction unit 203 is configured to determine a set of training samples, and for each node information group in the node information group sequence, the sample reconstruction unit 203 determines a set of training samples based on the model structure information corresponding to the node information group. The model training unit 204 is configured to sample-reconstruct the training samples in the training sample set corresponding to the information group to generate a target training sample set, and the model training unit 204 is configured to perform the above training sample set for each sub-model of the respective sub-models. The model generating unit 205 is configured to perform model training on the sub-model through a set of target training samples corresponding to the sub-model to generate sub-training results, and the model generating unit 205 generates all the training results of the determined plurality of sub-models. In response to convergence, the system is configured to generate a trained initial model based on current model parameter information corresponding to the submodel in each of the submodels.

装置２００に記載されたユニットは、図１を参照して説明された方法の各ステップに対応
していることが理解される。したがって、上述の方法について説明した操作、特徴、およ
び生成された有益な効果は、装置２００および装置２００に含まれるユニットにも同様に
適用され、ここではこれ以上説明しない。
以下、図３を参照すると、本開示のいくつかの実施形態を実現するのに適した電子機器（
例えば、計算装置）３００の構造概略図が示されている。図３に示す電子機器は単なる一
例であり、本開示の実施形態の機能及び使用範囲に何ら制限を与えるものではない。
図３に示すように、電子機器３００は、読取り専用メモリ（ＲＯＭ）３０２に格納された
プログラム、または記憶装置３０８からランダムアクセスメモリ（ＲＡＭ）３０３にロー
ドされたプログラムに従って様々な適切な動作および処理を実行することができる処理装
置（例えば中央プロセッサ、グラフィックプロセッサなど）３０１を含むことができる。
ＲＡＭ３０３には、電子機器３００の動作に必要な各種プログラムやデータも記憶されて
いる。処理装置３０１、ＲＯＭ３０２及びＲＡＭ３０３は、バス３０４を介して互いに接
続されている。バス３０４には、入出力（Ｉ／Ｏ）インタフェース３０５も接続されてい
る。
一般に、Ｉ／Ｏインタフェース３０５には、タッチスクリーン、タッチパッド、キーボー
ド、マウス、カメラ、マイク、加速度計、ジャイロスコープなどを含む入力デバイス３０
６が、液晶ディスプレイ（ＬＣＤ）、スピーカ、バイブレータなどの出力デバイス３０７
、例えば、磁気テープ、ハードディスク等を含む記憶装置３０８、及び通信装置３０９を
含む。通信デバイス３０９は、データを交換するために電子デバイス３００が他のデバイ
スと無線または有線通信することを可能にすることができる。図３は、様々な装置を備え
た電子機器３００を示しているが、図示された装置のすべてを実施または備える必要はな
いことが理解されるべきである。代替的に、より多くまたはより少ない装置を実装または
備えることができる。図３に示す各ブロックは、１つのデバイスを表してもよいし、必要
に応じて複数のデバイスを表してもよい。 It will be understood that the units described in device 200 correspond to the steps of the method described with reference to FIG. Therefore, the operations, features, and beneficial effects described for the method described above apply equally to the device 200 and the units included in the device 200, and will not be further described here.
Referring now to FIG. 3, electronic equipment (
For example, a structural schematic diagram of a computing device) 300 is shown. The electronic device shown in FIG. 3 is merely an example, and does not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
As shown in FIG. 3, electronic device 300 performs various appropriate operations and processes in accordance with programs stored in read-only memory (ROM) 302 or programs loaded into random access memory (RAM) 303 from storage device 308. A processing device (eg, a central processor, a graphics processor, etc.) 301 can be included.
The RAM 303 also stores various programs and data necessary for the operation of the electronic device 300. The processing device 301, ROM 302, and RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to the bus 304 .
Generally, I/O interface 305 includes input devices 30 including touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.
6 is an output device 307 such as a liquid crystal display (LCD), a speaker, a vibrator, etc.
, for example, a storage device 308 including a magnetic tape, a hard disk, etc., and a communication device 309. Communication device 309 may enable electronic device 300 to communicate wirelessly or wired with other devices to exchange data. Although FIG. 3 depicts electronic device 300 with various devices, it should be understood that not all of the illustrated devices need be implemented or included. Alternatively, more or fewer devices may be implemented or included. Each block shown in FIG. 3 may represent one device, or may represent multiple devices as necessary.

Claims

A distributed heterogeneous data processing method based on deep learning,
determining a node information group sequence, one submodel of each submodel included in the initial model is placed in a node group corresponding to each node information group in the node information group sequence;
determining a set of training samples corresponding to the node information group based on node configuration information and model structure information corresponding to each node information group in the node information group sequence; characterizing the model structure of submodels corresponding to information groups;
For each node information group in the node information group sequence, objective training is performed by reconstructing the training samples in the training sample set corresponding to the node information group based on the model structure information corresponding to the node information group. generating a sample set;
generating a sub-training result by performing model training on the sub-model in each of the sub-models using a target training sample set corresponding to the sub-model; in response to all of the sub-training results converging, generating an initial model for which training has been completed based on current model parameter information corresponding to the sub-model in each of the sub-models;
including,
The “determining the node information group sequence” includes:
obtaining an initial node information sequence;
The initial node information can include a node number and node configuration information, and the node configuration information
can characterize the configuration of a node, and the node configuration information is the hardware that corresponds to the node.
It can be characterized by the maximum amount of data processed per unit time depending on the software configuration,
determining computing power requirement information corresponding to the submodel in each of the submodels;
The computing power requirement information characterizes the computing power demand of the submodel, and the computing power demand is
can be obtained by quantizing the maximum data processing amount of each layer included in
and
For each submodel in each of the submodels, computing power requirement information corresponding to the submodel is provided.
an initial node corresponding to the sub-model from the initial node information sequence based on the initial node information sequence.
sorting information subsets as node information groups;
The amount of floating-point computation characterized by the computing power requirement information corresponding to the submodel is
Matches the maximum data processing amount of the node corresponding to the node information in the node information group,
including;
The node configuration information includes task scheduler configuration information, internal memory configuration information, and
data processor configuration information, and the internal memory configuration information includes memory size information and
and memory read/write speed information, and
The node corresponding to each node information group in the node information group sequence
Based on the node configuration information and model structure information, the training corresponding to the node information group is
``determining the set of samples'' means
Task scheduler configuration information included in the node configuration information corresponding to the node information group
Based on the information, schedule the task of the node group corresponding to the node information group.
determining capability information; and
Task scheduling capability information is provided by the central process of the nodes in the corresponding node group.
The size of the central processor's scheduling capacity can be characterized and
The magnitude of cursive power can be obtained by grade quantization, and task scheduling
The competency information may include a task scheduling competency grade;
The value of the performance grade is an integer between [1,100],
Included in the memory configuration information included in the node configuration information corresponding to the node information group
Based on the memory size information and memory read/write speed information, the node information group corresponding to the node information group is
determining memory caching capability information of the node group to be configured;
Memory caching capacity information indicates the memory cache of the nodes in the corresponding node group.
The size of the caching ability can be characterized, and the size of the memory caching ability can be characterized.
The size can be obtained by grade quantization, and the memory caching capacity information can be obtained by
may include a caching capability grade, and the value of the memory caching capability grade is
is an integer between [1,100],
Data processor configuration information included in the node configuration information corresponding to the node information group
data processing of a node group corresponding to said node information group based on
establishing competency information;
The data processing capacity information is the graphics of the node in the corresponding node group.
It can characterize the amount of data processing capability of the card and
The amount of data processing capability of the data can be obtained by grade quantization,
Processing capability information may include a data processing capability grade;
The value of the lossing ability grade is an integer between [1,100],
The task scheduling capability information, the memory caching capability information, and the data
The node information group is based on the data processing capacity information and the model structure information.
determining training sample request information for a node group corresponding to the node group;
Based on the training sample request information, the training corresponding to the node information group is
determining the sampling sample set; and
A processing method characterized by comprising:

The processing method according to claim 1 ,
Each of the sub-models includes a face recognition model, a race recognition model, and a posture recognition model,
The above-mentioned "generating a sub-training result by performing model training on the sub-model in each of the sub-models using a target training sample set corresponding to the sub-model",
In response to the determined sub-model being the face recognition model, the face recognition model is trained by performing model training on the face recognition model using a target training sample set corresponding to the face recognition model. generating sub-training results corresponding to;
In response to the confirmed sub-model being the race recognition model, performing model training on the race recognition model using a target training sample set corresponding to the race recognition model; generating a sub-training result corresponding to a race recognition model; and in response to said determined sub-model being said pose recognition model, said training sample set corresponding to said pose recognition model; generating sub-training results corresponding to the posture recognition model by performing model training on the posture recognition model;
The above-mentioned "generating an initial model for which training has been completed based on the current model parameter information corresponding to the sub-model in each of the sub-models"
generating current model parameter information corresponding to the face recognition model by determining current model parameters of the face recognition model in response to convergence of sub-training results corresponding to the determined face recognition model;
In response to the convergence of sub-training results corresponding to the finalized racial recognition model,
Generating current model parameter information corresponding to the racial recognition model by determining current model parameters of the racial face recognition model;
In response to the convergence of the sub-training results corresponding to the determined pose recognition model,
determining the current model parameters of the posture face recognition model to generate current model parameter information corresponding to the posture recognition model; and based on the obtained plurality of current model parameter information, generating a candidate initial model by updating model parameters;
The above-mentioned "generating an initial model for which training has been completed based on current model parameter information corresponding to a sub-model in each of the sub-models" further includes:
generating target samples and obtaining a target sample set by extracting training samples from a plurality of training sample sets corresponding to each of the sub-models;
training the candidate initial model using the target sample set, and generating the trained initial model in response to convergence of the target training result corresponding to the determined candidate initial model; including,
The processing method further includes:
Obtaining a target image including the target object;
Generating user identity information including face position information, race information, and user posture information corresponding to the target image by inputting the target image to the trained initial model;
generating a marked target image by performing image marking on the target image based on face position information, race information, and user posture information included in the user identity information; performing image storage for the target image;
The said processing method characterized by comprising.