JP2020501238A

JP2020501238A - Face detection training method, apparatus and electronic equipment

Info

Publication number: JP2020501238A
Application number: JP2019525952A
Authority: JP
Inventors: 浩王; 志鋒李; ▲興▼ 季; 凡 ▲賈▼; 一同王
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-06-02
Filing date: 2018-03-16
Publication date: 2020-01-16
Anticipated expiration: 2038-03-16
Also published as: US10929644B2; US20210089752A1; WO2018219016A1; CN108985135A; TWI665613B; MA48806A; CN110490177A; EP3633549A4; US11594070B2; JP6855098B2; TW201832134A; KR102236046B1; US20190251333A1; KR20190116397A; EP3633549A1

Abstract

本発明の実施例は、顔検出トレーニング方法、装置及び電子機器を提供しており、当該方法は、今回の反復のバッチデータトレーニングサンプルを取得することと、各トレーニングサンプルに対応する中心損失値を確定することと、各トレーニングサンプルに対応する中心損失値に基づき、バッチデータトレーニングサンプルに対応する中心損失値を確定することと、少なくともバッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定することと、顔検出の目標損失値が収束条件に達していないと、顔検出の目標損失値に基づき、顔検出モデルにおけるネットワークパラメータを更新して、次回の反復に進むことと、顔検出の目標損失値が収束条件に達すると、顔検出を出力することとを含む。本発明の実施例によれば、顔のクラス内の差異に対して不変性を有しながら、顔検出が顔及び非顔に対して高いクラス間検出性能を保証することを可能にし、顔検出のロバスト性を向上させることができる。Embodiments of the present invention provide a face detection training method, apparatus, and electronics that obtain a batch data training sample of the current iteration and determine a central loss value corresponding to each training sample. Determining, based on the central loss value corresponding to each training sample, determining the central loss value corresponding to the batch data training sample; and at least based on the central loss value corresponding to the batch data training sample. To determine the target loss value and, if the target loss value for face detection does not reach the convergence condition, update the network parameters in the face detection model based on the target loss value for face detection and proceed to the next iteration And outputting face detection when the target loss value for face detection reaches the convergence condition. According to an embodiment of the present invention, it is possible to enable face detection to guarantee high inter-class detection performance for faces and non-faces, while having invariance to differences within classes of faces, Robustness can be improved.

Description

本出願は、２０１７年６月２日に中国特許庁に提出された、出願番号が「２０１７１０４０６７２６.９」であって、発明の名称が「顔検出器トレーニング方法、装置及び電子機器」である中国特許出願に基づく優先権を主張するものであり、その全内容を本出願に参照により援用する。 This application was filed with the China Patent Office on June 2, 2017, and filed with the application number "201710406726.9" and the title of the invention "Face Detector Training Method, Apparatus and Electronic Equipment". It claims priority from a patent application, the entire contents of which are incorporated by reference into this application.

本発明は、画像処理技術分野に関し、具体的に、顔検出トレーニング方法、装置及び電子機器に関する。 The present invention relates to the field of image processing, and more particularly, to a face detection training method, apparatus, and electronic apparatus.

顔検出は、顔検出器によって画像から顔を検出するための技術であり、顔検出のトレーニングが良いか悪いかは、顔の検出效果に影響を直接与えるので、顔検出のトレーニング処理をどのように最適化するかはずっと当業者による研究の焦点である。 Face detection is a technology for detecting a face from an image using a face detector. Whether face detection training is good or bad directly affects the face detection effect. Optimization has always been the focus of research by those skilled in the art.

ディープラーニングの発展に伴い、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ、畳み込みニューラルネットワーク）に基づく顔検出トレーニング、例えば、ＦａｓｔｅｒＲＣＮＮ(ＦａｓｔｅｒＲｅｇｉｏｎ-ｂａｓｅｄＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ)である畳み込みニューラルネットワークを使用して顔検出のトレーニングを行うことなどが顔検出の主流のトレーニング方法になり、ＣＮＮに基づく顔検出のトレーニング処理は主に、顔検出モデルを構築し、トレーニングサンプルによって反復のトレーニングを行って、各反復で顔検出モデルのネットワークパラメータを更新して、顔検出のトレーニング最適化を実現し、なお、各反復において顔検出モデルのネットワークパラメータを更新する処理は、顔検出の最適化処理とみなすことができる。 With the development of deep learning, face detection training based on CNN (Convolutional Neural Network, convolutional neural network), for example, using a convolutional neural network that uses face region training based on Faster RCNN (Faster Region-based Convolutional Neural Network). Performing face detection is a mainstream training method for face detection. The face detection training process based on CNN mainly constructs a face detection model, performs repetitive training using training samples, and performs face detection model Update the network parameters to achieve face detection training optimization, and note that at each iteration the network parameters of the face detection model Can be regarded as face detection optimization processing.

現在の顔検出の最適化目標は主に、顔と非顔との差異を最大化すること（即ち、クラス間の差異を最大化すること）であり、顔と顔との間の差異はあまり注目されていないので、異なるシーンで顔の変化に対処するときに顔検出は判別能力が低くロバスト性が悪い。 The current goal of face detection optimization is to maximize the difference between faces and non-faces (ie, to maximize the differences between classes), and the differences between faces Since attention is not paid, face detection has low discrimination ability and poor robustness when dealing with face changes in different scenes.

本発明の実施例は、この点に鑑み、顔検出の顔検出判別能力を向上させ、顔検出のロバスト性を向上させるために、顔検出トレーニング方法、装置及び電子機器を提供する。 In view of the above, the embodiments of the present invention provide a face detection training method, apparatus, and electronic device for improving face detection discrimination ability of face detection and improving robustness of face detection.

上記の目的を達するために、本発明の実施例は以下のような技術的解決策を提供する。 To achieve the above object, embodiments of the present invention provide the following technical solutions.

顔検出トレーニング方法であって、
反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれることと、
各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定することと、
前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定することと、
前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定することと、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出のトレーニング結果を出力することと、を含む。 A face detection training method,
Obtaining a batch data training sample of the iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes;
Determining a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs;
Determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each of the training samples;
Determining a target loss value for face detection based on the central loss value corresponding to the batch data training sample;
And outputting a face detection training result when the target loss value for face detection reaches a set training convergence condition.

本発明の実施例は顔検出トレーニング装置をさらに提供し、
今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれるサンプル取得モジュールと、
各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定するためのサンプル中心損失値確定モジュールと、
前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定するためのバッチサンプル中心損失値確定モジュールと、
少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定するための検出目標損失値確定モジュールと、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、前記顔検出の目標損失値に基づき、顔検出モデルのネットワークパラメータを更新して、次回の反復に進むためのパラメータ更新モジュールと、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出のトレーニング結果を出力するための検出出力モジュールと、を含む。 Embodiments of the present invention further provide a face detection training device,
A sample acquisition module for acquiring a batch data training sample of this iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes;
A sample center loss value determination module for determining a center loss value corresponding to each training sample based on a feature vector of each training sample and a center feature vector of a sample class to which each training sample belongs;
A batch sample center loss value determination module for determining a center loss value corresponding to the batch data training sample based on the center loss value corresponding to each of the training samples;
A detection target loss value determination module for determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample;
If the target loss value of the face detection does not reach the set training convergence condition, the network parameter of the face detection model is updated based on the target loss value of the face detection, and the parameter is updated to proceed to the next iteration. Modules and
A detection output module for outputting a face detection training result when the target loss value for face detection reaches a set training convergence condition.

本発明の実施例は電子機器をさらに提供し、メモリとプロセッサーを含み、
前記メモリにはプログラムが記憶され、前記プロセッサーは前記プログラムを呼び出して、前記プログラムにより、
今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれ、
各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定し、
前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定し、
少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定し、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、前記顔検出の目標損失値に基づき、顔検出モデルのネットワークパラメータを更新して、次回の反復に進み、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出の検出結果を出力する。 An embodiment of the present invention further provides an electronic device, including a memory and a processor,
A program is stored in the memory, and the processor calls the program, and according to the program,
Obtaining a batch data training sample of this iteration, wherein said batch data training sample includes a plurality of training samples of different sample classes;
Determining a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs;
Determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each of the training samples;
Determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample;
If the target loss value of the face detection has not reached the set training convergence condition, based on the target loss value of the face detection, update the network parameters of the face detection model, proceed to the next iteration,
When the target loss value of the face detection reaches the set training convergence condition, the detection result of the face detection is output.

本発明の実施例はコンピュータ可読記憶媒体をさらに提供し、命令を含み、当該命令がコンピュータで実行される時に、コンピュータに第１の態様に記載された方法を実行させる。 Embodiments of the present invention further provide a computer readable storage medium including instructions, and causing the computer to perform the method described in the first aspect when the instructions are executed on the computer.

本発明の実施例は、命令が含まれるコンピュータプログラム製品をさらに提供し、それがコンピュータで実行されるときに、コンピュータに第１の態様に記載された方法を実行させる。 Embodiments of the present invention further provide a computer program product that includes instructions, which when executed on a computer, causes the computer to perform the method described in the first aspect.

上記の技術的解決策によれば、本発明の実施例によって提供される顔検出トレーニング手順は、以下のことを含むことができ、今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれ、各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定し、前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定し、少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定し、前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、顔検出の目標損失値が設定されたトレーニング収束条件に達するまで、前記顔検出の目標損失値に基づき、顔検出モデルにおけるネットワークパラメータを更新して、次回の反復に進み、前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出を出力して、顔検出のトレーニングが完了する。 According to the above technical solution, the face detection training procedure provided by the embodiment of the present invention can include: obtaining a batch data training sample of the current iteration; The sample includes a plurality of training samples of different sample classes, and determines a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs. A central loss value corresponding to the batch data training sample is determined based on the central loss value corresponding to each of the training samples, and a target loss value for face detection is determined based on at least the central loss value corresponding to the batch data training sample. Confirmation, said face inspection If the target loss value does not reach the set training convergence condition, the network parameters in the face detection model are based on the target loss value for face detection until the target loss value for face detection reaches the set training convergence condition. Is updated to the next iteration, and when the target loss value of the face detection reaches the set training convergence condition, the face detection is output, and the face detection training is completed.

本発明の実施例では、顔検出のトレーニング最適化目標に、バッチデータトレーニングサンプルに対応する中心損失値を組み合わせることで、顔検出は、顔と顔との間のクラス内の差異に対して不変性を有することができるので、バッチデータトレーニングサンプルに対応する中心損失値を組み合わせて顔検出の最適化トレーニングを行うことで、最適化トレーニングされた顔検出は、顔のクラス内の差異に対して不変性を有し、顔及び非顔に対して高いクラス間検出性能を保証することが可能になり、顔検出のロバスト性を向上させることができる。 In an embodiment of the present invention, by combining the face optimization training goal with a central loss value corresponding to a batch data training sample, face detection is insensitive to intra-class differences between faces. Since it is possible to have degeneration, by performing the face detection optimization training by combining the central loss values corresponding to the batch data training samples, the optimization-trained face detection is a Since it has invariance, it is possible to guarantee high inter-class detection performance for faces and non-faces, and it is possible to improve the robustness of face detection.

本発明の実施例又は従来の技術における技術的解決策をより明確に説明するために、以下、実施例又は従来技術の説明で使用される図面について簡単に説明する。明らかなように、以下の説明における図面は本発明の実施形態にすぎない。当業者にとって、創造的な労力なしに提供された図面から他の図面もまた得られる。 In order to more clearly describe the technical solutions in the embodiments of the present invention or the related art, the drawings used in the description of the embodiments or the related art will be briefly described below. Apparently, the drawings in the following description are merely embodiments of the present invention. For those skilled in the art, other drawings may also be derived from the drawings provided without creative effort.

顔検出モデルの構成である。It is a structure of a face detection model. 顔検出モデルの他の構成である。This is another configuration of the face detection model. 電子機器のハードウェア構成のブロック図である。FIG. 3 is a block diagram of a hardware configuration of the electronic device. 本発明の実施例によって提供される顔検出トレーニング方法のフローチャートである。5 is a flowchart of a face detection training method provided by an embodiment of the present invention. 顔検出モデルに基づく顔検出トレーニングの概略図である。It is a schematic diagram of face detection training based on a face detection model. 顔枠座標回帰損失値の確定方法のフローチャートである。It is a flowchart of the determination method of a face frame coordinate regression loss value. バッチデータトレーニングサンプルの取得方法のフローチャートである。It is a flowchart of the acquisition method of a batch data training sample. 本発明の実施例によって提供される顔検出トレーニング装置の構成ブロック図である。1 is a block diagram illustrating a configuration of a face detection training apparatus provided by an embodiment of the present invention. 本発明の実施例によって提供される顔検出トレーニング装置の他の構成ブロック図である。FIG. 3 is a block diagram illustrating another configuration of the face detection training apparatus provided by an embodiment of the present invention. 本発明の実施例によって提供される顔検出トレーニング装置の別の構成ブロック図である。FIG. 4 is a block diagram illustrating another configuration of the face detection training apparatus provided by an embodiment of the present invention.

本発明の実施例における添付の図面を参照しながら、本発明の実施例における技術的解決策を以下に明確かつ完全に説明する。説明される実施例は本発明の一部の実施例にすぎず、全ての実施例ではないことは明らかである。創造的な努力なしに本発明における実施例に基づいて当業者によって得られた他の全ての実施例は、本発明の保護範囲内に属する。 The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Apparently, the described embodiments are merely some but not all of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments in the present invention without creative efforts fall within the protection scope of the present invention.

本発明の実施例は、ＣＮＮに基づいて構築された選択可能の顔検出モデルであって、図１に示すように、基本ネットワーク層、候補枠予測層、及び顔検出層を含む。 The embodiment of the present invention is a selectable face detection model constructed based on CNN, and includes a basic network layer, a candidate frame prediction layer, and a face detection layer as shown in FIG.

その中、基本ネットワーク層は、一連の畳み込み層（Ｃｏｎｖｏｌｕｔｉｏｎ）とプーリング層（Ｐｏｏｌｉｎｇ）によって順次に接続されてなるサブネットワークとすることができ、基本ネットワーク層は一連の畳み込み層によって、各トレーニングサンプル（トレーニングサンプルは画像形態のサンプルであってもよい）に対して層ごとの畳み込み処理を行うことができ、その中、次の畳み込み層は、前の畳み込み層が出力する畳み込み処理結果を畳み込み処理し、ここで、多層の畳み込み層によって処理される画像特徴において、浅い層の特徴は、リッチエッジコーナー、テクスチャ構造などの特徴を表すことができ、深い層の特徴は、浅い層の特徴に基づくさらなる抽象マッピングであり、多層の畳み込み層の層ごとの畳み込み処理によって、異なる層の画像特徴の抽出を実現することができ、各トレーニングサンプルに対して、基本ネットワーク層が出力するのは最後の層の畳み込み層によって畳み込み処理された特徴マップ（Ｆｅａｔｕｒｅｍａｐ）であり、その特徴マップは画像特徴の一つの表現である。 The basic network layer may be a sub-network that is sequentially connected by a series of convolution layers and a pooling layer, and the basic network layer may be formed by a series of convolution layers so that each of the training samples ( The training sample may be a sample in the form of an image), and a convolution process for each layer can be performed on the convolution process. In the next convolution layer, the convolution process result output from the previous convolution layer is convolved. Here, in the image features processed by the multiple convolutional layers, the features of the shallow layer can represent features such as rich edge corners, texture structures, and the features of the deep layer can be further based on the features of the shallow layer. This is an abstract mapping, and the convolution process for each layer of the multi-layer convolution layer Therefore, extraction of image features of different layers can be realized, and for each training sample, the output of the basic network layer is a feature map convolved by the convolution layer of the last layer. , The feature map is an expression of an image feature.

候補枠予測層は、基本ネットワーク層によって出力された画像特徴上で構築されたフル畳み込み構造のサブネットワークであってもよく、候補枠予測層は、畳み込み層によって各トレーニングサンプルの特徴をマッピングすることができ、これにより、マッピングされたノードによって候補枠分類器及び候補枠回帰器を構築して、候補枠検出を形成し、候補枠分類器を使用して候補枠（Ｐｒｏｐｏｓａｌｓ）の確率予測を行うことができ、候補枠回帰器を使用して候補枠の座標予測を行うことができ、これにより、候補枠（Ｐｒｏｐｏｓａｌｓ）を出力し、候補枠予測層が出力した候補枠は顔検出層に入力することができる。 The candidate frame prediction layer may be a full convolution subnetwork constructed on the image features output by the base network layer, wherein the candidate frame prediction layer maps the features of each training sample by the convolution layer. , Thereby constructing a candidate frame classifier and a candidate frame regressor with the mapped nodes, forming a candidate frame detection, and using the candidate frame classifier to make a probability prediction of the candidate frame (Proposals). A candidate frame regressor can be used to predict the coordinates of a candidate frame, thereby outputting candidate frames (Proposals) and inputting the candidate frame output from the candidate frame prediction layer to the face detection layer. can do.

顔検出層は、基本ネットワーク層が出力する画像特徴及び候補枠予測層が出力する候補枠に基づいて構築された注目領域プーリング層（ＲｏＩＰｏｏｌｉｎｇ）を含むサブネットワークであってもよく、各トレーニングサンプルに対して、顔検出層は、候補枠（Ｐｒｏｐｏｓａｌｓ）に基づいて、基本ネットワーク層が出力するトレーニングサンプルの画像特徴に対して次元削減サンプリングを行って、固定サイズの特徴マップを取得して、特徴マップにおける全てのノードを固定長の特徴ベクトルにマッピングすることができ、これにより、各トレーニングサンプルの特徴ベクトルが得られ、各トレーニングサンプルの特徴ベクトルに基づいて、顔分類器及び顔回帰器を構築し、顔分類器と顔回帰器は組み合わせて顔検出を構成し、その中、顔分類器は顔及び非顔の確率を予測することができ、顔回帰器は、候補枠に基づいて顔枠のより正確的な座標回帰を行うことができる。 The face detection layer may be a subnetwork including an attention area pooling layer (RoI Pooling) constructed based on the image features output from the basic network layer and the candidate frames output from the candidate frame prediction layer. On the other hand, the face detection layer performs dimension reduction sampling on the image feature of the training sample output from the basic network layer based on the candidate frame (Proposals), acquires a fixed-size feature map, All nodes in the map can be mapped to a fixed-length feature vector, which gives a feature vector for each training sample and builds a face classifier and face regressor based on the feature vector for each training sample Then, the face classifier and the face regressor combine to form face detection. , The face classifier can predict face and non-face probabilities, and the face regressor can perform more accurate coordinate regression of the face frame based on the candidate frame.

さらに、図１に示す顔検出モデルのさらなる選択可能の細分化は、ＦａｓｔｅｒＲＣＮＮに基づく顔検出モデルによって実現することができ、ＦａｓｔｅｒＲＣＮＮは顔検出のための典型的なアルゴリズムであり、ＲＰＮ（ＲｅｇｉｏｎＰｒｏｐｏｓａｌＮｅｔｗｏｒｋｓ）層とＦａｓｔＲＣＮＮ層に分けられ、ＲＰＮ層は、候補枠を生成し、ＦａｓｔＲＣＮＮ層は、候補枠に基づいて、最終的試験結果を得ることができる。 In addition, a further selectable subdivision of the face detection model shown in FIG. 1 can be realized by a face detection model based on the Faster RCNN, which is a typical algorithm for face detection and the RPN (Region) It is divided into a Proposal Networks layer and a Fast RCNN layer. The RPN layer generates a candidate frame, and the Fast RCNN layer can obtain final test results based on the candidate frame.

図２に示すように、ＦａｓｔｅｒＲＣＮＮに基づく顔検出モデルは、基本ネットワーク層、ＲＰＮ層、ＦａｓｔＲＣＮＮ層を含むことができ、その中、ＲＰＮ層は、候補枠予測層の選択可能実現と考えることができ、ＦａｓｔＲＣＮＮ層は、顔検出層の選択可能実現と考えることができる。 As shown in FIG. 2, a face detection model based on the Faster RCNN can include a basic network layer, an RPN layer, and a Fast RCNN layer, in which the RPN layer is considered to be a selectable realization of a candidate frame prediction layer. The Fast RCNN layer can be considered as a selectable realization of the face detection layer.

本発明の実施例では、ＲＰＮ層の目標は、基本ネットワーク層が出力する画像特徴に基づいて候補枠を生成することであり、この過程では、本発明の実施例は、複数のアンカー枠を予め定義することができ、当該複数のアンカー枠は、異なる縮尺とアスペクト比をカバーし、当該予め定義された複数のアンカー枠によって、トレーニングサンプルにおけるサブ枠を確定し、当該サブ枠によって、候補枠を予測することができる（例えば、当該サブ枠を利用して候補枠検出をトレーニングすることで、候補枠検出によって候補枠の予測を行うことができる）。 In the embodiment of the present invention, the goal of the RPN layer is to generate a candidate frame based on the image features output by the basic network layer. In this process, the embodiment of the present invention uses a plurality of anchor frames in advance. The plurality of anchor frames can cover different scales and aspect ratios, the sub-frames in the training sample can be determined by the predefined anchor frames, and the candidate frames can be defined by the sub-frames. It can be predicted (for example, by training the candidate frame detection using the sub-frame, the candidate frame can be predicted by the candidate frame detection).

また、アンカー枠は、ＲＰＮ層の内側であり、候補枠（Ｐｒｏｐｏｓａｌ）の分類器及び回帰器を定義及び構築するために使用されてもよい。ＲＰＮは候補枠検出である。具体的には、各アンカー枠は、それぞれ検出（分類と回帰）に関連付けられ、分類と回帰は、トレーニングと学習のための予測値と目標値を必要とする。ＲＰＮでは、分類目標値の確定（つまり、如何にこの出力がポジティブクラスかネガティブクラスかを定義する方法）はアンカー枠と実枠との間の重なり率に基づく。同様に、ＦａｓｔＲＣＮＮでは、分類目標値の確定は候補枠と実枠との重なり率に基づく。そのため、ＲＰＮが使用するアンカー枠とＦａｓｔＲＣＮＮが使用する候補枠は、分類器を構築するときに類似の作用があり、アンカー枠は候補枠の候補枠とみなすことができる。ＲＰＮは画像特徴の畳み込み処理された後の各ノードに対して複数の候補枠検出を構築することができる（各候補枠検出が一つのアンカー枠に関連付けられる）。 Also, the anchor frame is inside the RPN layer and may be used to define and construct a classifier and regressor for the Proposal. RPN is candidate frame detection. Specifically, each anchor frame is associated with a respective detection (classification and regression), and classification and regression require predicted and target values for training and learning. In RPN, the determination of the classification target value (ie, how to define whether this output is a positive class or a negative class) is based on the overlap rate between the anchor frame and the real frame. Similarly, in the Fast RCNN, the determination of the classification target value is based on the overlap rate between the candidate frame and the actual frame. Therefore, the anchor frame used by the RPN and the candidate frame used by the Fast RCNN have a similar effect when constructing a classifier, and the anchor frame can be regarded as a candidate frame of the candidate frame. The RPN can build multiple candidate frame detections for each node after image feature convolution has been performed (each candidate frame detection is associated with one anchor frame).

ＦａｓｔＲＣＮＮ層の目標は、候補枠と基本ネットワーク層が出力する画像特徴に基づいて、トレーニングサンプルの特徴ベクトルを生成することであり、これにより、トレーニングサンプルの特徴ベクトルで顔分類器及び顔回帰器を構築し、顔分類器及び顔回帰器は組み合わせて顔検出を構成する。 The goal of the Fast RCNN layer is to generate a feature vector of the training sample based on the candidate frame and the image features output by the basic network layer, thereby using a face classifier and a face regressor with the feature vector of the training sample. And the face classifier and the face regressor combine to form face detection.

顔検出により良い検出効果を持たせるために、確率勾配降下アルゴリズム（ＳｔｏｃｈａｓｔｉｃＧｒａｄｉｅｎｔＤｅｓｃｅｎｔ、ＳＧＤ）などのモデル最適化アルゴリズムによって、反復トレーニングを行うことができ、各反復において、トレーニングサンプル集合からバッチデータトレーニングサンプルを選択することによってトレーニングを行い、次に、毎回の反復において、顔検出の最適化目標が達成されたかどうかに応じて、顔検出モデルのネットワークパラメータを更新する。 In order to have a better detection effect on face detection, iterative training can be performed by a model optimization algorithm such as a stochastic gradient descent algorithm (SGD), and in each iteration, batch data training is performed from a training sample set. Training is performed by selecting samples, and then, in each iteration, the network parameters of the face detection model are updated depending on whether the face detection optimization goal has been achieved.

現在、主に、顔と非顔との差異を最大化することを顔検出の最適化目標としており、顔と顔との間の異なる場面での顔の変化の差異を無視し、例えば、異なる撮影角度、解像度、照明条件、表情の変化及びオクルージョンなどの場面での顔の変化の差異が無視され、トレーニングされた顔検出の判別能力が弱くなり、ロバスト性が悪くなる。例えば、クラス内の顔と顔（例えば、光があるか、ないか）の差異が大きすぎると、異なるクラスとして判断されるが、実際には同じクラスであり、そのため、本発明の実施例は、クラス内の差異をできるだけ小さくして、顔検出がクラス内の差異に対して不変性を有することを保証する必要がある。 Currently, the main goal of face detection optimization is to maximize the difference between a face and a non-face, ignoring differences in face changes in different scenes between the face and the face, for example, Differences in face changes in scenes such as shooting angles, resolutions, lighting conditions, changes in facial expressions, and occlusion are ignored, and the discrimination ability of trained face detection is weakened, resulting in poor robustness. For example, if the difference between a face and a face (for example, with or without light) in a class is too large, it is determined as a different class, but it is actually the same class. , The differences in the classes need to be as small as possible to ensure that face detection is invariant to the differences in the classes.

これに基づいて、本発明の実施例は、顔検出の反復トレーニング最適化処理を改善して、新しい顔検出トレーニング方法を提出し、これにより、顔検出が顔と顔との間のクラス内の差異を低減しながら、顔及び非顔に対して高い検出性能を有することを保証し、顔検出の判別能力を向上させる。 Based on this, embodiments of the present invention improve the iterative training optimization process of face detection and submit a new face detection training method whereby face detection is performed within the class between faces. While reducing the difference, it is ensured that the face and the non-face have high detection performance, and the face detection discrimination ability is improved.

本発明の実施例によって提供される顔検出トレーニング方法は、プログラムの形態によって顔検出トレーニングを実施するための電子機器にロードすることができ、当該電子機器は、ネットワーク側のサーバーであってもよいし、ユーザー側のパーソナルコンピュータ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ、ＰＣ）などの端末装置であってもよく、当該電子機器の形態は、顔検出の実際のトレーニングニーズに応じて決定することができる。 The face detection training method provided by an embodiment of the present invention can be loaded into an electronic device for performing face detection training in the form of a program, and the electronic device can be a server on a network side. Alternatively, the terminal device may be a terminal device such as a personal computer (PC) on the user side, and the form of the electronic device can be determined according to actual training needs for face detection.

また、顔検出トレーニングを実施するための電子機器のハードウェア構成は図３に示すように、少なくとも１つのプロセッサー１、少なくとも１つの通信インターフェース２、少なくとも１つのメモリ３、及び少なくとも１つの通信バス４を含むことができる。 As shown in FIG. 3, the hardware configuration of the electronic device for performing face detection training includes at least one processor 1, at least one communication interface 2, at least one memory 3, and at least one communication bus 4. Can be included.

本発明の実施例では、プロセッサー１、通信インターフェース２、メモリ３、通信バス４の数は少なくとも１つであり、且つ、プロセッサー１、通信インターフェース２、メモリ３は通信バス４を介して互いの通信を完成し、明らかに、図３に示すプロセッサー１、通信インターフェース２、メモリ３、及び通信バス４の通信接続は単なるオプションである。 In the embodiment of the present invention, the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 communicate with each other via the communication bus 4. Obviously, the communication connections of processor 1, communication interface 2, memory 3, and communication bus 4 shown in FIG. 3 are merely optional.

また、通信インターフェース２は、ＧＳＭ（登録商標）モジュールのインターフェースなどの通信モジュールのインターフェースであってもよく、
プロセッサー１は、中央処理装置ＣＰＵ、又は特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ、ＡＳＩＣ）、又は本発明の実施例を実施するように構成された１つ又は複数の集積回路とすることができる。 Further, the communication interface 2 may be a communication module interface such as a GSM (registered trademark) module interface,
The processor 1 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement an embodiment of the present invention. .

メモリ３は、高速ＲＡＭメモリを含むことができ、例えば、少なくとも１つの磁気ディスクメモリのような不揮発性メモリ（ｎｏｎ-ｖｏｌａｔｉｌｅｍｅｍｏｒｙ、ＮＶＭ）も含むことができる。 The memory 3 may include a high-speed RAM memory, and may also include, for example, a non-volatile memory (NVM) such as at least one magnetic disk memory.

なお、メモリ３にはプログラムが記憶され、プロセッサー１はメモリ３に記憶されたプログラムを呼び出して、当該プログラムは、本発明の実施例によって提供される顔検出トレーニング方法を実行する。 Note that a program is stored in the memory 3, and the processor 1 calls up the program stored in the memory 3, and the program executes the face detection training method provided by the embodiment of the present invention.

本発明の実施例は、確率勾配降下アルゴリズム（ＳｔｏｃｈａｓｔｉｃＧｒａｄｉｅｎｔＤｅｓｃｅｎｔ、ＳＧＤ）などのモデル最適化アルゴリズムによって顔検出の反復トレーニングを行うことができ、ＳＧＤはよく使用されている畳み込みニューラルネットワーク最適化アルゴリズムであり、大規模な機械学習問題を解決するのに効果的であり、ＳＧＤは、毎回の反復で、トレーニングサンプル集合からランダムに抽出されたバッチデータトレーニングサンプル（Ｍｉｎｉｂａｔｃｈ）を使用して勾配降下最適化を行う。 Embodiments of the present invention can perform iterative face detection training using a model optimization algorithm such as a stochastic gradient descent algorithm (SGD), and SGD is a commonly used convolutional neural network optimization algorithm. Yes, and effective in solving large-scale machine learning problems, SGD uses gradient descent optimization using batch data training samples (Minibatch) randomly extracted from a training sample set at each iteration. I do.

一回の反復に係る顔検出トレーニングを例として、本発明の実施例によって提供される顔検出トレーニング方法のフローは図４に示すようであり、毎回の反復に係る顔検出トレーニングのフローは図４を参照することができる。図４を参照すると、本発明の実施例によって提供される顔検出トレーニング方法は、以下を含むことができる。 FIG. 4 shows a flow of the face detection training method provided by the embodiment of the present invention, taking the face detection training according to one iteration as an example, and the flow of the face detection training according to each iteration is shown in FIG. Can be referred to. Referring to FIG. 4, the face detection training method provided by an embodiment of the present invention may include:

ステップＳ１００、今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれる。 Step S100, acquiring a batch data training sample of the current iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes.

また、バッチデータトレーニングサンプル（Ｍｉｎｉｂａｔｃｈ）は、全てのトレーニングサンプルが含まれたトレーニングサンプル集合から抽出されてもよい。 In addition, the batch data training sample (Minibatch) may be extracted from a training sample set including all the training samples.

顔検出は、2つのカテゴリのタスク（顔と非顔）を実現すると考えることができ、毎回の反復において、トレーニングサンプル集合から複数の顔画像をポジティブクラスのトレーニングサンプルとして取得し、複数の非顔画像をネガティブクラスのトレーニングサンプルとして取得し、取得したポジティブクラスのトレーニングサンプルとネガティブクラスのトレーニングサンプルで、毎回の反復のバッチデータトレーニングサンプルを構成することができる。 Face detection can be thought of as fulfilling two categories of tasks (face and non-face). In each iteration, multiple face images are acquired from the training sample set as positive class training samples, and multiple non-face The image is acquired as a training sample of a negative class, and the acquired training sample of the positive class and the training sample of the negative class can constitute a batch data training sample for each iteration.

それに対応して、今回の反復で使用されるバッチデータトレーニングサンプルは、複数のトレーニングサンプルを含むことができ、当該複数のトレーニングサンプルのサンプルクラスはポジティブクラス（即ち、顔画像をポジティブクラスとするトレーニングサンプルに対応する）及びネガティブクラス（即ち、非顔画像をネガティブクラスとするトレーニングサンプルに対応する）に分類できる。 Correspondingly, the batch data training sample used in the current iteration may include a plurality of training samples, and the sample class of the plurality of training samples is a positive class (ie, a training class in which a face image is a positive class). Sample) and a negative class (that is, corresponding to a training sample having a non-face image as a negative class).

ステップＳ１１０、各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定する。 In step S110, a central loss value corresponding to each training sample is determined based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs.

前記バッチデータトレーニングサンプルにおける一つのトレーニングサンプルについて、本発明の実施例は、当該トレーニングサンプルの特徴ベクトル、及び当該トレーニングサンプルが属するサンプルクラスの前記バッチデータトレーニングサンプルにおける対応する中心特徴ベクトルを確定することができ、これによって、当該トレーニングサンプルに対応する中心損失値を確定することができ、この処理は、前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対して行われ、前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する中心損失値を取得する。 For one training sample in the batch data training sample, an embodiment of the present invention determines a feature vector of the training sample and a corresponding central feature vector in the batch data training sample of a sample class to which the training sample belongs. Whereby a central loss value corresponding to the training sample can be determined, the process being performed for each training sample in the batch data training sample, and for each training sample in the batch data training sample. To obtain the central loss value corresponding to.

また、前記バッチデータトレーニングサンプルにおける一つのサンプルクラスの中心特徴ベクトルは、今回の反復で前記バッチデータトレーニングサンプルのうち当該サンプルクラスに属するトレーニングサンプルの特徴ベクトルの平均値に対応して更新できてもよい。 Also, the central feature vector of one sample class in the batch data training sample may be updated in this iteration corresponding to the average value of the feature vectors of the training samples belonging to the sample class among the batch data training samples. Good.

また、一つのサンプルクラスについて、本発明の実施例は、前記バッチデータトレーニングサンプルにおける当該サンプルクラスに属する各トレーニングサンプルを確定し、前記バッチデータトレーニングサンプルのうち当該サンプルクラスに属する各トレーニングサンプルの特徴ベクトルに基づき、当該サンプルクラスに属する各トレーニングサンプルの特徴ベクトルの平均値を確定して、前記バッチデータトレーニングサンプルのうち当該サンプルクラスの中心特徴ベクトルの更新変数を取得し、前記更新変数と設定された学習率に基づき、前記バッチデータトレーニングサンプルにおける当該サンプルクラスの中心特徴ベクトルを取得して、これにより、前記バッチデータトレーニングサンプルにおける一つのサンプルクラスの各トレーニングサンプルの特徴ベクトルの平均値に基づき、当該サンプルクラスの中心特徴ベクトルを更新することが実現されてもよい。 In addition, for one sample class, the embodiment of the present invention determines each training sample belonging to the sample class in the batch data training sample, and determines a characteristic of each training sample belonging to the sample class in the batch data training sample. Based on the vector, determine the average value of the feature vector of each training sample belonging to the sample class, obtain an update variable of the central feature vector of the sample class of the batch data training samples, and set the update variable. And obtaining a central feature vector of the sample class in the batch data training sample based on the learning rate. Based on the average value of the feature vectors of training samples, it may be implemented to update the central feature vector of the sample class.

また、本発明の実施例は以下のような式によってバッチデータトレーニングサンプルにおける一つのサンプルクラスの中心特徴ベクトルを確定することができる。 Further, the embodiment of the present invention can determine the center feature vector of one sample class in the batch data training sample by the following equation.

その中、
は設定された学習率を表し、
は更新変数を表し、
はｔ回目の反復でｊ番目のサンプルクラスによって使用されるバッチデータトレーニングサンプルに対応する中心特徴ベクトルを表し、
はｔ+１回目の反復でｊ番目のサンプルクラスによって使用されるバッチデータトレーニングサンプルに対応する中心特徴ベクトルを表す。 Among them,
Represents the set learning rate,
Represents an update variable,
Represents the central feature vector corresponding to the batch data training sample used by the jth sample class at the tth iteration,
Represents the central feature vector corresponding to the batch data training sample used by the j-th sample class in the t + 1 iteration.

即ち、一つのサンプルクラスについて、本発明の実施例は、前回の反復においてバッチデータトレーニングサンプルの当該サンプルクラスにおいて対応する中心特徴ベクトルから、前記更新変数と設定された学習率との積を減算して、今回の反復においてバッチデータトレーニングサンプルの当該サンプルクラスにおいて対応する中心特徴ベクトルを取得してもよい。 That is, for one sample class, the embodiment of the present invention subtracts the product of the update variable and the set learning rate from the central feature vector corresponding to the sample class of the batch data training sample in the previous iteration. Then, the central feature vector corresponding to the sample class of the batch data training sample in this iteration may be obtained.

また、前記バッチデータトレーニングサンプルにおけるポジティブクラスのサンプルクラスについて、本発明の実施例は、前記バッチデータトレーニングサンプルのうちポジティブクラスに属する各トレーニングサンプルの特徴ベクトルを確定し、ポジティブクラスに属する各トレーニングサンプルの特徴ベクトルの平均値を確定することで、ポジティブクラスのサンプルクラスの中心特徴ベクトルを更新することができ、それに対応して、前記バッチデータトレーニングサンプルにおけるネガティブクラスのサンプルクラスについて、本発明の実施例は前記バッチデータトレーニングサンプルのうちネガティブクラスに属する各トレーニングサンプルの特徴ベクトルを確定し、ネガティブクラスに属する各トレーニングサンプルの特徴ベクトルの平均値を確定することで、ネガティブクラスのサンプルクラスの中心特徴ベクトルを更新することができる。 Further, with respect to the sample class of the positive class in the batch data training sample, the embodiment of the present invention determines the feature vector of each training sample belonging to the positive class among the batch data training samples, and determines each training sample belonging to the positive class. By determining the average value of the feature vectors of the positive class, the central feature vector of the sample class of the positive class can be updated, and correspondingly, for the sample class of the negative class in the batch data training sample, the present invention is implemented. In the example, the feature vector of each training sample belonging to the negative class among the batch data training samples is determined, and the feature vector of each training sample belonging to the negative class is determined. By determining the average value of Le, it is possible to update the central feature vectors of sample class negative class.

また、さらに、前記バッチデータトレーニングサンプルのうちポジティブクラスの各トレーニングサンプルについて、本発明の実施例は、ポジティブクラスの各トレーニングサンプルの特徴ベクトルと、ポジティブクラスのサンプルクラスの中心特徴ベクトルとに基づき、ポジティブクラスの各トレーニングサンプルに対応する中心損失値を確定することができ、前記バッチデータトレーニングサンプルのうちネガティブクラスの各トレーニングサンプルについて、本発明の実施例はネガティブクラスの各トレーニングサンプルの特徴ベクトルと、ネガティブクラスのサンプルクラスの中心特徴ベクトルとに基づき、ネガティブクラスの各トレーニングサンプルに対応する中心損失値を確定することができる。 Further, for each training sample of the positive class among the batch data training samples, the embodiment of the present invention, based on the feature vector of each training sample of the positive class and the center feature vector of the sample class of the positive class, A central loss value corresponding to each training sample of the positive class can be determined, and for each training sample of the negative class among the batch data training samples, an embodiment of the present invention provides a feature vector and a feature vector of each training sample of the negative class. , The central loss value corresponding to each training sample of the negative class can be determined based on the central feature vector of the sample class of the negative class.

また、一つのトレーニングサンプルの中心損失値は、当該トレーニングサンプルの特徴ベクトルと、当該トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルとの距離によって表されてもよく、ｘ_ｉはバッチデータトレーニングサンプルのうちｉ番目のトレーニングサンプルを表し、ｙ_ｉはｘ_ｉが属するサンプルクラスを表し（ｙ_ｉ=１がポジティブクラスを表し、ｙ_ｉ=０がネガティブクラスを表すように設定することができ、勿論、ｙ_ｉ=０がネガティブクラスを表し、ｙ_ｉ=１がポジティブクラスを表すように設定することができ、ポジティブクラスとネガティブクラスに対応するｙ_ｉ値が異なればよい）、ｃ_ｙｉはｘ_ｉが属するサンプルクラスｙ_ｉの中心特徴ベクトルを表すと仮定すると、ｘ_ｉサンプルの中心損失値は
として定義できる。 The center loss value of one training sample, the feature vector of the training samples may be represented by the distance between the center feature vectors of sample class to which the training samples belong, x _i is out of the batch data training samples represents the _ith training sample, y _i represents the sample class to which x _i belongs (y _i = 1 represents the positive class, y _i = 0 can be set to represent the negative class, and of course, y _i = 0 represents a negative class, y _i = 1 can be set to represent a positive class, and y _i values corresponding to the positive class and the negative class may be different), and c _yi belongs to x _i assuming represents the center feature vectors of sample class y _i, heart loss of x _i samples The value
Can be defined as

なお、一つのトレーニングサンプルの特徴ベクトルの確定処理は、基本ネットワーク層がトレーニングサンプルの画像特徴を出力した後、顔検出層が候補枠（Ｐｒｏｐｏｓａｌｓ）に基づいてトレーニングサンプルにおける注目領域を確定して、顔検出層がトレーニングサンプルの注目領域の画像特徴に対して次元削減サンプリングを行い、固定サイズの特徴マップを取得し、特徴マップにおける全てのノードを接続して、固定長の特徴ベクトルにマッピングして、トレーニングサンプルの特徴ベクトルを取得してもよい。 Note that, in the process of determining the feature vector of one training sample, after the basic network layer outputs the image features of the training sample, the face detection layer determines the region of interest in the training sample based on the candidate frame (Proposals), The face detection layer performs dimension reduction sampling on the image feature of the attention area of the training sample, acquires a fixed-size feature map, connects all the nodes in the feature map, and maps it to a fixed-length feature vector. , A feature vector of the training sample may be obtained.

ステップＳ１２０、前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定する。 Step S120, determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each training sample.

また、本発明の実施例は、各トレーニングサンプルに対応する中心損失値に基づき、各トレーニングサンプルに対応する中心損失値の平均値を確定し、各トレーニングサンプルに対応する中心損失値の平均値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定してもよい。 Further, according to the embodiment of the present invention, based on the central loss value corresponding to each training sample, the average value of the central loss value corresponding to each training sample is determined, and the average value of the central loss value corresponding to each training sample is determined. Based on this, a central loss value corresponding to the batch data training sample may be determined.

また、本発明の実施例は、各トレーニングサンプルに対応する中心損失値の平均値をそのまま前記バッチデータトレーニングサンプルに対応する中心損失値として使用してもよく、各トレーニングサンプルに対応する中心損失値の平均値を設定数値（例えば、１/２）に乗算して、前記バッチデータトレーニングサンプルに対応する中心損失値を取得してもよい。 Further, in the embodiment of the present invention, the average value of the center loss values corresponding to each training sample may be used as the center loss value corresponding to the batch data training sample, and the center loss value corresponding to each training sample may be used. May be multiplied by a set value (eg, 例えば) to obtain a central loss value corresponding to the batch data training sample.

バッチデータトレーニングサンプルにｍ個のトレーニングサンプルがあると仮定すると、バッチデータトレーニングサンプルに対応する中心損失値は、
として表すことができる。 Assuming that the batch data training sample has m training samples, the central loss value corresponding to the batch data training sample is
Can be expressed as

ステップＳ１３０、少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定する。 Step S130, determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample.

顔検出の目標損失値は、顔検出反復トレーニング処理における最適化目標の表現であり、目標損失値が設定されたトレーニング収束条件（例えば、最小）に達すると、反復トレーニングは終了して、顔検出を出力することができ、毎回の反復では、本発明の実施例は、本発明の実施例の顔検出の最適化目標として、既存の顔検出の最適化目標を、使用するバッチデータトレーニングサンプルに対応する中心損失値と組み合わせて、顔検出の目標損失値を取得する。 The target loss value for face detection is an expression of an optimization target in the face detection iterative training process. When the target loss value reaches a set training convergence condition (for example, a minimum), the iterative training ends and face detection is performed. And in each iteration, the embodiment of the present invention uses the existing face detection optimization goal as the face detection optimization goal of the embodiment of the invention to the batch data training sample to be used. A target loss value for face detection is obtained in combination with the corresponding center loss value.

また、本発明の実施例は、前記バッチデータトレーニングサンプルに対応する中心損失値、前記バッチデータトレーニングサンプルに対応する分類損失値、及び前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値に基づき、顔検出の目標損失値を確定してもよい。 In addition, the embodiment of the present invention is based on the central loss value corresponding to the batch data training sample, the classification loss value corresponding to the batch data training sample, and the face frame coordinate regression loss value corresponding to the batch data training sample. Alternatively, the target loss value for face detection may be determined.

その中、バッチデータトレーニングサンプルに対応する分類損失値は、バッチデータトレーニングサンプルにおける各トレーニングサンプルの分類予測確率と分類目標確率（分類の真の確率）との差に基づき確定できる。 The classification loss value corresponding to the batch data training sample can be determined based on the difference between the classification prediction probability and the classification target probability (true classification probability) of each training sample in the batch data training sample.

本発明の実施例は、バッチデータトレーニングサンプルの各トレーニングサンプルについて、トレーニングサンプルの特徴ベクトルを取得した後、Ｓｏｆｔｍａｘ関数などを使用して、当該トレーニングサンプルが属するサンプルクラスを予測し、当該トレーニングサンプルの分類予測確率を取得することができ、当該トレーニングサンプルの分類予測確率と当該トレーニングサンプルの真の分類目標確率に基づき、当該トレーニングサンプルに対応する分類損失値（例えば、当該トレーニングサンプルの分類予測確率と分類目標確率との差）を確定することができ、バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類損失値に基づき、バッチデータトレーニングサンプルに対応する分類損失値を確定することができる（例えば、各トレーニングサンプルの分類損失値の平均値を取るなど）。 In the embodiment of the present invention, for each training sample of the batch data training sample, after acquiring a feature vector of the training sample, a sample class to which the training sample belongs is predicted using a Softmax function or the like, and the A classification prediction probability can be obtained, and based on the classification prediction probability of the training sample and the true classification target probability of the training sample, the classification loss value corresponding to the training sample (for example, the classification prediction probability of the training sample and Difference from the classification target probability), and based on the classification loss value corresponding to each training sample in the batch data training sample, the classification loss value corresponding to the batch data training sample is determined. It is (for example, taking the average value of the classification loss value of each training sample).

これにより分かるように、バッチデータトレーニングサンプルに対応する分類損失値は、顔検出が顔と非顔とのクラス間を分類する指標であり、バッチデータトレーニングサンプルに対応する分類損失値は、顔と非顔との間の差異（クラス間の差異）を表すことができ、バッチデータトレーニングサンプルに対応する分類損失値を顔検出の最適化目標の部分として使用して、最適化された顔検出が、顔と非顔とのクラス間の区別に対して高い性能を持つようにすることができる。 As can be seen, the classification loss value corresponding to the batch data training sample is an index by which face detection classifies between face and non-face classes, and the classification loss value corresponding to the batch data training sample is The difference between non-faces (difference between classes) can be represented, and the classification loss value corresponding to the batch data training sample is used as part of the optimization target for face detection, so that optimized face detection , High performance in discriminating between face and non-face classes.

これに基づいて、バッチデータトレーニングサンプルに対応する中心損失値は、トレーニングサンプルの特徴ベクトルとトレーニングサンプルが属するサンプルクラスの中心特徴ベクトルとの距離を表し、そのため、バッチデータトレーニングサンプルに対応する中心損失値は、トレーニングサンプルの特徴ベクトルとそれが属するサンプルクラスの中心特徴ベクトルとの差異を説明することができ、各サンプルクラスのうちトレーニングサンプルのクラス内の特徴ベクトル差異を表すことができ、そのため、バッチデータトレーニングサンプルに対応する中心損失値を顔検出の最適化目標の部分として使用し、最適化された顔検出は、顔のクラス内の差異（例えば、異なる場面での顔と顔との間のクラス内の差異）に対して不変性を有し、顔検出のロバスト性を向上させることができる。 Based on this, the central loss value corresponding to the batch data training sample represents the distance between the feature vector of the training sample and the central feature vector of the sample class to which the training sample belongs, so that the central loss value corresponding to the batch data training sample The value can describe the difference between the feature vector of the training sample and the central feature vector of the sample class to which it belongs, and can represent the feature vector difference within the class of the training sample of each sample class, Using the central loss values corresponding to the batch data training samples as part of the optimization goal for face detection, the optimized face detection can be used to determine differences within a class of faces (eg, between faces in different scenes). Invariance within the class of It is possible to improve the robustness of the detection.

また、顔検出のトレーニングは、分類トレーニングと回帰トレーニングを含むことができ、共同トレーニングの過程であり、１回の反復においてバッチデータトレーニングサンプルに対応する中心損失値と分類損失値によって構成される損失値は、分類トレーニングの最適化目標とみなすことができ、顔検出トレーニングにおける分類トレーニングの最適化目標として、例えば、バッチデータトレーニングサンプルに対応する中心損失値和と類損失値によって構成される損失値を最小にしてもよい。 In addition, face detection training may include classification training and regression training, and is a process of joint training, in which a single loss includes a central loss value and a classification loss value corresponding to a batch data training sample. The value can be regarded as an optimization target of the classification training, and as the optimization target of the classification training in the face detection training, for example, a loss value constituted by a central loss value sum and a class loss value corresponding to the batch data training sample. May be minimized.

毎回の反復において、顔検出トレーニングにおける回帰トレーニングの最適化目標は、バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値によって構成できる。 In each iteration, the optimization goal of regression training in face detection training can be configured by face frame coordinate regression loss values corresponding to batch data training samples.

１回の反復におけるバッチデータトレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値を組み合わせることによって、顔検出の目標損失値を形成して、顔検出トレーニングの最適化目標を表すことができる。 By combining the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to the batch data training sample in one iteration, a target loss value for face detection is formed to optimize the face detection training target. Can be represented.

また、本発明の実施例は、前記バッチデータトレーニングサンプルに対応する中心損失値と第１の設定重みとの積、前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値と第２の設定重みとの積、及び、前記バッチデータトレーニングサンプルに対応する分類損失値を合計して、顔検出の目標損失値を取得することができる。 The embodiment of the present invention further comprises a product of a center loss value corresponding to the batch data training sample and a first set weight, a face frame coordinate regression loss value corresponding to the batch data training sample and a second set weight. And the classification loss value corresponding to the batch data training sample to obtain a target loss value for face detection.

Ｌ_ｃｌｓがバッチデータトレーニングサンプルに対応する分類損失値を表し、Ｌ_ｃがバッチデータトレーニングサンプルに対応する中心損失値を表すと仮定すると、顔検出の目標損失値は、Ｌ_ｃｌｓ+μＬ_ｃ+λＬ_ｒｅｇとして表し、μとλは、設定重み係数を表し、その中、μは第１の設定重みであり、λは第２の設定重みである。 L _cls represents the classification loss value corresponding to the batch data training samples, the L _c is assumed to represent the center loss value corresponding to the batch data training samples, the target loss value of the face detection, L _{cls ₊} μL c ₊ λL _Expressed as _reg , μ and λ represent set weighting factors, where μ is a first set weight and λ is a second set weight.

また、本発明の実施例は、前記バッチデータトレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値を直接合計して、顔検出の目標損失値を取得してもよい。 Further, in the embodiment of the present invention, the target loss value for face detection may be obtained by directly summing the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to the batch data training sample. .

ステップＳ１４０、前記顔検出の目標損失値が設定されたトレーニング収束条件に達するかどうかを判断し、そうでなければ、ステップＳ１５０を実行し、そうであれば、ステップＳ１６０を実行する。 Step S140: It is determined whether the target loss value of the face detection reaches the set training convergence condition. If not, step S150 is executed. If so, step S160 is executed.

また、設定されたトレーニング収束条件は、顔検出の目標損失値を最小にするとみなすことができる。 Further, the set training convergence condition can be regarded as minimizing the target loss value of face detection.

具体的に、バッチデータトレーニングサンプルに対応する分類損失値が小さいほど、顔と非顔の分類に対する顔検出の效果がよく、顔検出は、顔と非顔の差異区別を最大化する（即ち、クラス間の差異を最大化する）ことができ、バッチデータトレーニングサンプルに対応する中心損失値が小さいほど、各サンプルクラスにおけるトレーニングサンプルのクラス内の特徴ベクトルの差異が小さく、同じサンプルクラスのトレーニングサンプルの差異を減らして、サンプルクラスにおける顔と顔との間の差異をさらに低下することができ、即ち、反復トレーニングによって、バッチデータトレーニングサンプルの各トレーニングサンプルの特徴ベクトルと当該トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルとの距離が最小化される。 Specifically, the smaller the classification loss value corresponding to the batch data training sample, the better the effect of face detection on the classification of face and non-face, and the face detection maximizes the distinction between face and non-face (ie, The difference between the feature vectors in the classes of the training samples in each sample class is smaller, the smaller the central loss value corresponding to the batch data training sample, the smaller the training sample of the same sample class. Can be reduced to further reduce the difference between faces in the sample class, i.e., by iterative training, the feature vector of each training sample in the batch data training sample and the sample class to which the training sample belongs Distance to the central feature vector of It is.

これにより分かるように、バッチデータトレーニングサンプルに対応する中心損失値を組み合わせることによって、顔検出の目標損失値を確定し、それによって、顔検出の目標損失値でトレーニング収束条件を判断するので、顔検出は、最小化されたバッチデータトレーニングサンプルに対応する中心損失値の場合に、顔検出が顔のクラス内の差異（例えば、異なる場面での顔と顔との間のクラス内の差異）に対して不変性を有することを保証して、顔検出のロバスト性を向上させることができる。 As can be seen, the target loss value for face detection is determined by combining the central loss values corresponding to the batch data training samples, thereby determining the training convergence condition with the target loss value for face detection. Detection is based on the case where the face detection determines the difference in the class of the face (eg, the difference in the class between faces in different scenes) in the case of the central loss value corresponding to the minimized batch data training sample. On the other hand, it is possible to improve the robustness of face detection by guaranteeing invariance.

ステップＳ１５０、前記顔検出の目標損失値に基づき、顔検出モデルにおける顔検出に関連するネットワークパラメータを更新して、次回の反復に進み、ステップＳ１００に戻る。 Step S150: Update the network parameters related to face detection in the face detection model based on the target loss value of the face detection, proceed to the next iteration, and return to step S100.

また、顔検出の目標損失値が設定されたトレーニング収束条件に達していない（例えば、顔検出の目標損失値が最小に達していない）と、本発明の実施例は、前記顔検出の目標損失値に基づき、顔検出モデルにおけるネットワークパラメータを更新することができ、また、反復トレーニングのフローに従って次回の反復を行い、ステップＳ１００に戻り、ステップＳ１４０の判断結果、顔検出の目標損失値が設定されたトレーニング収束条件に達するまで、ネットワークパラメータが更新された顔検出モデルでステップＳ１００からステップＳ１４０を繰り返して実行してもよい。 Further, if the target loss value for face detection does not reach the set training convergence condition (for example, the target loss value for face detection does not reach the minimum), the embodiment of the present invention will be described. Based on the values, the network parameters in the face detection model can be updated, and the next iteration is performed according to the flow of the iterative training, the process returns to step S100, and the result of the determination in step S140, the target loss value for face detection is set. Until the training convergence condition is reached, steps S100 to S140 may be repeatedly executed using the face detection model whose network parameters have been updated.

また、本発明の実施例は、確率勾配降下法によって次回の反復に進み、ステップＳ１００に戻ってもよい。 Further, the embodiment of the present invention may proceed to the next iteration by the stochastic gradient descent method and return to step S100.

ステップＳ１６０、顔検出を出力する。 Step S160: Output face detection.

また、顔検出の目標損失値が設定されたトレーニング収束条件に達する（例えば、顔検出の目標損失値が最小化される）と、顔検出モデルトレーニングによって得られた顔検出を出力することができ、顔検出の反復トレーニング最適化処理を完成してもよい。 When the target loss value for face detection reaches a set training convergence condition (for example, when the target loss value for face detection is minimized), the face detection obtained by face detection model training can be output. Alternatively, the iterative training optimization process for face detection may be completed.

本発明の実施例によって提供される顔検出トレーニングのフローは以下のことを含むことができ、即ち、今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれ、各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定し、前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定し、少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定し、前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、顔検出の目標損失値が設定されたトレーニング収束条件に達するまで、前記顔検出の目標損失値に基づき、顔検出モデルにおけるネットワークパラメータを更新して、次回の反復に進み、前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出を出力することができ、顔検出のトレーニングを完成する。 The face detection training flow provided by an embodiment of the present invention may include: obtaining a batch data training sample of the current iteration, wherein the batch data training sample includes a plurality of different sample classes. And a center loss value corresponding to each training sample is determined based on a feature vector of each training sample and a center feature vector of a sample class to which each training sample belongs, and a center corresponding to each training sample is determined. Determining a central loss value corresponding to the batch data training sample based on the loss value; determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample; Loss value set If the training convergence condition has not been reached, the network parameters in the face detection model are updated based on the target loss value for face detection until the target loss value for face detection reaches the set training convergence condition. When the target loss value for face detection reaches the set training convergence condition, face detection can be output, and training for face detection is completed.

本発明の実施例では、顔検出のトレーニング最適化目標に、バッチデータトレーニングサンプルに対応する中心損失値を組み合わせ、これにより、顔検出は、顔と顔との間のクラス内の差異に対して不変性を有することが可能になるので、バッチデータトレーニングサンプルに対応する中心損失値を組み合わせて顔検出の最適化トレーニングを行うことによって、最適化トレーニングされた顔検出は、顔のクラス内の差異に対して不変性を有しながら、顔及び非顔に対して高いクラス間検出性能を保証することが可能になり、顔検出のロバスト性を向上させることができる。 In an embodiment of the present invention, the face detection training optimization goal is combined with a central loss value corresponding to the batch data training sample, so that face detection is performed with respect to intra-class differences between faces. By performing face-training optimization training by combining the central loss values corresponding to the batch data training samples, it is possible to have invariance, so that the optimized-trained face detection reduces the differences within the class of faces. , It is possible to guarantee high inter-class detection performance for faces and non-faces, and to improve the robustness of face detection.

また、本発明の実施例は、顔検出の目標損失値に基づき、顔検出モデルにおけるネットワークパラメータを更新するときに、顔検出の目標損失値に基づき、バックプロパゲーションにより、顔検出モデルにおけるネットワークパラメータを更新してもよい。 Further, the embodiment of the present invention, when updating the network parameters in the face detection model based on the target loss value of the face detection, based on the target loss value of the face detection, by back propagation, the network parameters in the face detection model May be updated.

また、本発明の実施例は、前記顔検出の目標損失値、及び前回の反復の顔検出モデルにおけるネットワークパラメータに基づき、顔検出のパラメータ更新値を確定することで、当該顔検出のパラメータ更新値に基づき、前回の反復の顔検出モデルにおけるネットワークパラメータを更新してもよい。 Further, the embodiment of the present invention determines the parameter update value of the face detection based on the target loss value of the face detection and the network parameter in the face detection model of the previous iteration, thereby obtaining the parameter update value of the face detection. , The network parameters in the face detection model of the previous iteration may be updated.

また、顔検出の目標損失値がＬｏｓｓであって、Ｌｏｓｓ=Ｌ_ｌｓ+μＬ_ｃ+λＬ_ｒｅｇであり、前回反復された顔検出モデルにおけるネットワークパラメータはＷ１であると仮定すると、顔検出のパラメータ更新値は、
として表すことができる。 Assuming that the target loss value of face detection is Loss, Loss = L _ls + μL _c + λL _reg , and the network parameter in the previously repeated face detection model is W1, the parameter update of face detection is performed. value is,
Can be expressed as

顔検出のパラメータ更新値に基づき、前回反復された顔検出モデルにおけるネットワークパラメータを更新することは次の式によって実現することができる。
Updating the network parameters in the previously repeated face detection model based on the face detection parameter update value can be realized by the following equation.

その中、Ｗ２は更新された顔検出モデルのネットワークパラメータであり、ｋは動量であり、
は学習率であり、ｓは重みの減衰係数である。 W2 is a network parameter of the updated face detection model, k is a moving amount,
Is a learning rate, and s is a weight attenuation coefficient.

また、本発明の実施例は、図５に示すように、顔検出層（例えば、ＦａｓｔＲＣＮＮ層）に中心損失関数（ＣｅｎｔｅｒＬｏｓｓ）を設置することができ、当該中心損失関数は、顔検出層の完全接続特徴表現層に適用することができ、当該完全接続徴表現層は、各トレーニングサンプルの特徴ベクトルを得るように、完全接続の形で特徴マップにおける全てのノードを接続して、固定長の特徴ベクトルにマッピングすることができ、これにより、毎回の反復トレーニングにおいて、中心損失関数は今回の反復で使用されるバッチデータトレーニングサンプルの各トレーニングサンプルの特徴ベクトルに基づき、バッチデータトレーニングサンプルの各トレーニングサンプルに対応する中心損失値を確定し、バッチデータトレーニングサンプルに対応する中心損失値Ｌｃを対応して確定してもよい。 In addition, according to the embodiment of the present invention, as shown in FIG. 5, a center loss function (Center Loss) can be installed in a face detection layer (for example, a Fast RCNN layer), and the center loss function is determined by the face detection layer. Of the fully connected feature representation layer, which connects all nodes in the feature map in a fully connected manner to obtain a feature vector of each training sample, and has a fixed length , So that in each iteration training, the central loss function is based on the feature vector of each training sample of the batch data training samples used in this iteration, and for each of the batch data training samples Determine the central loss value corresponding to the training sample and set the batch data training sample The central loss value Lc corresponding to may be determined correspondingly.

同時に、顔検出層（例えば、ＦａｓｔＲＣＮＮ層）にＳｏｆｔｍａｘ関数を設置することができ、Ｓｏｆｔｍａｘ関数は顔検出層の完全接続特徴表現層に適用することができ、反復トレーニングの毎回のトレーニングにおいて、Ｓｏｆｔｍａｘ関数は各トレーニングサンプルの特徴ベクトルを処理して、各トレーニングサンプルの分類予測確率を確定することができ、さらに、ＳｏｆｔｍａｘＬｏｓｓ（分類損失関数）によってトレーニングサンプルの分類予測確率と分類目標確率（分類の真の確率）との差を表し、且つ、バッチデータトレーニングサンプルに対応する分類損失値Ｌ_ｃｌｓを確定する。 At the same time, a Softmax function can be installed in the face detection layer (for example, the Fast RCNN layer), and the Softmax function can be applied to the fully connected feature representation layer of the face detection layer. The function can process the feature vector of each training sample to determine the classification prediction probability of each training sample, and furthermore, classify the training sample classification prediction probability and classification target probability (classification target probability) by Softmax Loss (classification loss function). (True probability) and determine a classification loss value L _cls corresponding to the batch data training sample.

即ち、Ｓｏｆｔｍａｘ関数の入力はトレーニングサンプルの特徴ベクトルであり、出力はトレーニングサンプルの各サンプルクラスに属する予測確率であり、ＳｏｆｔｍａｘＬｏｓｓ（分類損失関数）の入力はトレーニングサンプルのｐ（分類予測確率）とｐ*（分類目標確率）であり、出力は損失値（Ｌｏｓｓ）であり、Ｌｏｓｓが小さいほど、分類がより正確であることを表す。本発明の実施例では、ＣｅｎｔｅｒＬｏｓｓとＳｏｆｔｍａｘＬｏｓｓは同じ層（即ち、入力する特徴ベクトルが同じである）に作用し、ＣｅｎｔｅｒＬｏｓｓは顔検出の最適化の補助監視信号として使用され、ＣｅｎｔｅｒＬｏｓｓが小さいほど、顔検出で検出されたクラス内の特徴の差異が小さいことを表し、ＳｏｆｔｍａｘＬｏｓｓは顔検出で検出されたクラス間の特徴を互いに分離させ、クラス間の判別可能な差異を保証する。 That is, the input of the Softmax function is the feature vector of the training sample, the output is the prediction probability belonging to each sample class of the training sample, and the input of Softmax Loss (classification loss function) is p (classification prediction probability) of the training sample. The output is a loss value (Loss), and the smaller the Loss, the more accurate the classification. In an embodiment of the present invention, the Center Loss and the Softmax Loss operate on the same layer (that is, the input feature vectors are the same), the Center Loss is used as an auxiliary monitoring signal for optimizing face detection, and the Center Loss is used as an auxiliary monitoring signal. The smaller the value, the smaller the difference between the features in the classes detected by the face detection. The Softmax Loss separates the features between the classes detected by the face detection from each other, and guarantees a identifiable difference between the classes.

また、さらに、本発明の実施例では、顔検出層（例えば、ＦａｓｔＲＣＮＮ層）に顔枠回帰予測関数ＳｍｏｏｔｈＬ１（平滑化ノルム関数）を設置することができ、ＳｍｏｏｔｈＬ１によって、候補枠に基づいて、バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する顔枠予測座標を確定し、さらに、ＳｍｏｏｔｈＬ１Ｌｏｓｓによって各トレーニングサンプルに対応する顔枠座標回帰損失値を確定し、その入力はトレーニングサンプルに対応する顔枠予測座標と顔枠目標座標であり、出力は損失値（Ｌｏｓｓ）であり、次に、バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値Ｌｒｅｇを確定する。 Further, in the embodiment of the present invention, a face frame regression prediction function SmoothL1 (smoothed norm function) can be set in a face detection layer (for example, a Fast RCNN layer), and based on the candidate frame, A face frame predicted coordinate corresponding to each training sample in the batch data training sample is determined, and further, a face frame coordinate regression loss value corresponding to each training sample is determined by SmoothL1 Loss, and the input is a face frame corresponding to the training sample. The predicted coordinates and the face frame target coordinates, and the output is a loss value (Loss). Next, the face frame coordinate regression loss value Lreg corresponding to the batch data training sample is determined.

また、本発明の実施例は、顔検出の目標損失値Ｌｏｓｓ=Ｌ_ｌｓ+μＬ_ｃ+λＬ_ｒｅｇを確定し、目標損失値Ｌｏｓｓが最小になるまで、毎回の反復で得られた目標損失値Ｌｏｓｓによって、顔検出モデルにおけるネットワークパラメータを更新してもよい。 Further, in the embodiment of the present invention, the target loss value Loss = L _ls + μL _c + λL _reg for face detection is determined, and the target loss value Loss obtained in each iteration until the target loss value Loss is minimized. , The network parameters in the face detection model may be updated.

また、１回の反復におけるバッチデータトレーニングサンプルに対応する分類損失値の確定処理は、以下のようである。 The process of determining the classification loss value corresponding to the batch data training sample in one iteration is as follows.

バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類予測確率と分類目標確率に基づき、前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類損失値を確定し、
前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類損失値に基づき、前記バッチデータトレーニングサンプルに対応する分類損失値を確定することができる。 Determining a classification loss value corresponding to each training sample in the batch data training sample, based on the classification prediction probability and the classification target probability corresponding to each training sample in the batch data training sample;
A classification loss value corresponding to the batch data training sample may be determined based on a classification loss value corresponding to each training sample in the batch data training sample.

また、１回の反復におけるバッチデータトレーニングサンプルに対応する顔枠座標回帰損失値の確定処理は、図６に示すように、以下のことを含むことができる。 In addition, the determination processing of the face frame coordinate regression loss value corresponding to the batch data training sample in one iteration can include the following as shown in FIG.

ステップＳ２００、候補枠に基づき、バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する顔枠予測座標を確定する。 In step S200, based on the candidate frame, the face frame predicted coordinates corresponding to each training sample in the batch data training sample are determined.

また、本発明の実施例は、候補枠予測層が出力する候補枠に基づき、今回の反復におけるバッチデータトレーニングサンプルの各トレーニングサンプルの注目領域を確定して、各トレーニングサンプルに対応する顔枠予測座標を得ることができ、トレーニングサンプルの顔枠予測座標は、左上頂点の横座標、左上頂点の縦座標、右下頂点の横座標、右下頂点の縦座標などによって表すことができる。 Further, the embodiment of the present invention determines the attention area of each training sample of the batch data training sample in the current iteration based on the candidate frame output from the candidate frame prediction layer, and determines the face frame prediction corresponding to each training sample. The coordinates can be obtained, and the predicted coordinates of the face frame of the training sample can be represented by the abscissa of the upper left vertex, the ordinate of the upper left vertex, the abscissa of the lower right vertex, the ordinate of the lower right vertex, and the like.

また、本発明の実施例は、顔検出層（例えば、ＦａｓｔＲＣＮＮ層）に顔枠回帰予測関数ＳｍｏｏｔｈＬ１（平滑化ノルム関数）を設置し、ＳｍｏｏｔｈＬ１によって、候補枠に基づいて、各トレーニングサンプルに対応する顔枠予測座標を確定してもよい。 Further, in the embodiment of the present invention, a face frame regression prediction function SmoothL1 (smoothing norm function) is installed on a face detection layer (for example, a Fast RCNN layer), and each training sample is supported by SmoothL1 based on a candidate frame. May be determined.

ステップＳ２１０、各トレーニングサンプルに対応する顔枠予測座標、及び各トレーニングサンプルに対応する顔枠目標座標に基づき、各トレーニングサンプルに対応する顔枠座標回帰損失値を確定する。 Step S210, a face frame coordinate regression loss value corresponding to each training sample is determined based on the face frame predicted coordinates corresponding to each training sample and the face frame target coordinates corresponding to each training sample.

また、トレーニングサンプルに対応する顔枠目標座標は、トレーニングサンプルにおける顔枠に真に対応する座標であってもよく、各トレーニングサンプルについて、本発明の実施例は、当該トレーニングサンプルに対応する顔枠予測座標及び顔枠目標座標の差によって、当該トレーニングサンプルに対応する顔枠座標回帰損失値を確定することができるので、各トレーニングサンプルに対してこの処理を行うことによって、各トレーニングサンプルに対応する顔枠座標回帰損失値を得ることができる。 Further, the face frame target coordinates corresponding to the training sample may be coordinates that truly correspond to the face frame in the training sample, and for each training sample, the embodiment of the present invention may use the face frame corresponding to the training sample. The regression loss value of the face frame coordinates corresponding to the training sample can be determined from the difference between the predicted coordinates and the target coordinates of the face frame. The face frame coordinate regression loss value can be obtained.

また、本発明の実施例は、ＳｍｏｏｔｈＬ１Ｌｏｓｓによって顔枠座標回帰損失値を表すことができ、その入力はトレーニングサンプルに対応する顔枠予測座標と顔枠目標座標であり、出力は損失値（Ｌｏｓｓ）であり、Ｌｏｓｓが小さいほど、顔枠の回帰はより正確であることを表す。 Also, in the embodiment of the present invention, the face frame coordinate regression loss value can be represented by SmoothL1 Loss, the input is the face frame predicted coordinate and the face frame target coordinate corresponding to the training sample, and the output is the loss value (Loss). ), Indicating that the smaller the Loss is, the more accurate the regression of the face frame is.

ステップＳ２２０、各トレーニングサンプルに対応する顔枠座標回帰損失値に基づき、前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値を確定する。 Step S220, determining a face frame coordinate regression loss value corresponding to the batch data training sample based on the face frame coordinate regression loss value corresponding to each training sample.

また、本発明の実施例は、バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する顔枠座標回帰損失値に基づき、各トレーニングサンプルに対応する顔枠座標回帰損失値の平均値を確定し、当該平均値に基づき、バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値（ＳｍｏｏｔｈＬ１Ｌｏｓｓ）を確定してもよい。 Further, the embodiment of the present invention determines the average value of the face frame coordinate regression loss values corresponding to each training sample based on the face frame coordinate regression loss values corresponding to each training sample in the batch data training sample, and determines the average value. Based on the value, a face frame coordinate regression loss value (SmoothL1 Loss) corresponding to the batch data training sample may be determined.

また、本発明の実施例では、顔検出反復トレーニングを行う処理は、顔分類及び回帰の２つの共同タスクが含まれる多損失関数共同トレーニングを使用し、分類トレーニングはＣｅｎｔｅｒＬｏｓｓとＳｏｆｔｍａｘＬｏｓｓを使用して共同で最適化し、回帰トレーニングはＳｍｏｏｔｈＬ１Ｌｏｓｓを使用して最適化し、顔検出の最終的な最適化目標は、バッチデータトレーニングサンプルに対応するＣｅｎｔｅｒＬｏｓｓ、ＳｏｆｔｍａｘＬｏｓｓ、及びＳｍｏｏｔｈＬ１Ｌｏｓｓの３つの損失値の加重合計が最小になることである。 Further, in the embodiment of the present invention, the process of performing face detection iterative training uses a multi-loss function joint training including two joint tasks of face classification and regression, and the classification training uses Center Loss and Softmax Loss. And regression training are optimized using SmoothL1 Loss, and the final optimization goal for face detection is the three loss values of Center Loss, Softmax Loss, and SmoothL1 Loss corresponding to the batch data training samples. Is to be minimized.

また、本発明の実施例は、一般的な大規模顔識別タスク（ＩｍａｇｅＮｅｔ）における事前トレーニング済みモデルを微調整し（Ｆｉｎｅｔｕｎｉｎｇ）、中心損失値を顔検出の補助最適化目標として導入することによって、顔検出モデルの最適化及びトレーニングを指導し、顔と顔との間のクラス内の差異に対する顔検出の判別能力を向上させてもよい。 Embodiments of the present invention also provide for fine-tuning (Finetuning) pre-trained models in common large-scale face identification tasks (ImageNet), and introducing center loss values as auxiliary optimization goals for face detection. Guidance in optimizing and training face detection models may be used to improve face detection discrimination capabilities for intra-class differences between faces.

また、反復トレーニングの処理では、本発明の実施例は、前回の反復の顔検出モデルに基づいて、トレーニングサンプル集合における顔検出で検出されにくいトレーニングサンプルを確定し、次回の反復で使用されるバッチデータトレーニングサンプルを確定して、これらの検出が困難なトレーニングサンプルに対する顔検出の検出能力を向上し、トレーニングサンプルが検出されにくいどうかは、トレーニングサンプルに対応する目標損失値を測定することによって確定することができ、目標損失値が高いほど、トレーニングサンプルは最適化目標から遠く、検出の難しさはさらに大きいことを説明する。 In the iterative training process, the embodiment of the present invention determines a training sample that is hard to be detected by face detection in a training sample set based on the face detection model of the previous iteration, and determines a batch used in the next iteration. Determine the data training samples to improve the detection capability of face detection for these hard-to-detect training samples, and determine if the training samples are difficult to detect by measuring the target loss value corresponding to the training sample Explain that the higher the target loss value, the farther the training sample is from the optimization target and the more difficult the detection is.

それに対応して、図７は、本発明の実施例によって提供される、今回の反復のバッチデータトレーニングサンプルを取得する方法のフローチャートを示し、図７を参照すると、当該方法は、以下のことを含むことができる。 Correspondingly, FIG. 7 shows a flow chart of a method for obtaining a batch data training sample of the current iteration provided by an embodiment of the present invention, and referring to FIG. 7, the method comprises: Can be included.

ステップＳ３００、前回の反復の顔検出モデルを固定し、前回の反復の顔検出モデルで、トレーニングサンプル集合における各トレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値を取得する。 Step S300, fixing the face detection model of the previous iteration, and acquiring the center loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to each training sample in the training sample set by the face detection model of the previous iteration. I do.

ステップＳ３１０、トレーニングサンプル集合における各トレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値に基づき、トレーニングサンプル集合における各トレーニングサンプルの目標損失値を確定する。 Step S310, determining a target loss value of each training sample in the training sample set based on the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to each training sample in the training sample set.

また、一つのトレーニングサンプルについて、本発明の実施例は、当該トレーニングサンプルの中心損失値、分類損失値、及び顔枠座標回帰損失値を加重合計して、当該トレーニングサンプルの目標損失値を取得し、各トレーニングサンプルに対してこの処理を行うことによって、各トレーニングサンプルの目標損失値を取得してもよい。 In addition, for one training sample, the embodiment of the present invention obtains a target loss value of the training sample by adding up the center loss value, the classification loss value, and the face frame coordinate regression loss value of the training sample. By performing this process on each training sample, the target loss value of each training sample may be obtained.

また、一つのトレーニングサンプルについて、その目標損失値は、分類損失値＋μ中心損失値＋λ顔枠座標回帰損失値として表してもよい。 Further, for one training sample, the target loss value may be represented as a classification loss value + μ center loss value + λ face frame coordinate regression loss value.

また、一つのトレーニングサンプルについて、本発明の実施例は、当該トレーニングサンプルの中心損失値、分類損失値、及び顔枠座標回帰損失値を合計して、当該トレーニングサンプルの目標損失値を取得してもよい。 Also, for one training sample, the embodiment of the present invention obtains the target loss value of the training sample by summing the center loss value, the classification loss value, and the face frame coordinate regression loss value of the training sample. Is also good.

ステップＳ３２０、トレーニングサンプル集合におけるポジティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ポジティブクラスのサンプルクラスのうち最大の目標損失値の第１の数のトレーニングサンプルを選択し、トレーニングサンプル集合におけるネガティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ネガティブクラスのサンプルクラスにおける最大の目標損失値の第２の数のトレーニングサンプルを選択し、前記第２の数に対する前記第１の数の比値は設定比例に対応する。 Step S320, selecting the first number of training samples having the largest target loss value among the sample classes of the positive class based on the target loss value of each training sample among the sample classes of the positive class in the training sample set; Based on the target loss value of each training sample among the sample classes of the negative class in the set, select a second number of training samples having the largest target loss value in the sample class of the negative class, and select the second number of training samples for the second number. The ratio value of the number 1 corresponds to the set proportion.

また、本発明の実施例は、トレーニングサンプル集合における各トレーニングサンプルの目標損失値を得た後、トレーニングサンプル集合における各トレーニングサンプルをポジティブクラスとネガティブクラスのサンプルクラスに従って分類することができ、これにより、トレーニングサンプル集合におけるポジティブクラスに属するサンプルクラスのうち各トレーニングサンプルの目標損失値、及びトレーニングサンプル集合におけるネガティブクラスに属するサンプルクラスのうち各トレーニングサンプルの目標損失値を確定することができ、同時に、ポジティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ポジティブクラスに属するトレーニングサンプルをソートし（目標損失値を大きい順にソートしてもよく、目標損失値を小さい順にソートしてもよい）、ネガティブクラスのサンプルクラスにおける各トレーニングサンプルの目標損失値に基づき、ネガティブクラスに属するトレーニングサンプルをソートしてもよい。 In addition, the embodiment of the present invention can classify each training sample in the training sample set according to the sample class of the positive class and the negative class after obtaining the target loss value of each training sample in the training sample set. The target loss value of each training sample among the sample classes belonging to the positive class in the training sample set, and the target loss value of each training sample among the sample classes belonging to the negative class in the training sample set can be determined, The training samples belonging to the positive class are sorted based on the target loss value of each training sample among the sample classes of the positive class (the target loss values are sorted in ascending order). May be, it may be sorted target loss value in ascending order), based on the target loss value for each training sample in the sample class negative class may sort the training samples belonging to the negative class.

さらに、バッチデータトレーニングサンプルにおけるポジティブクラスのトレーニングサンプルとネガティブクラスのトレーニングサンプルとの設定比例に従って、トレーニングサンプル集合におけるポジティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ポジティブクラスのサンプルクラスのうち最大の目標損失値の第１の数のトレーニングサンプルを選択し、トレーニングサンプル集合におけるネガティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ネガティブクラスのサンプルクラスのうち最大の目標損失値の第２の数のトレーニングサンプルを選択し、前記第２の数に対する第１の数の比値がバッチデータトレーニングサンプルにおいて要求されるポジティブ、ネガティブクラスサンプルの数の設定比例に対応するようにする。 Further, according to the set proportion between the training sample of the positive class and the training sample of the negative class in the batch data training sample, the sample class of the positive class is determined based on the target loss value of each training sample among the sample classes of the positive class in the training sample set. And selecting the first number of training samples with the largest target loss value among the sample classes of the negative class based on the target loss value of each training sample among the sample classes of the negative class in the training sample set. Selecting a training sample of a second number of loss values, wherein a ratio of the first number to the second number is required in the batch data training sample; So as to correspond to the number of set proportion of moths Restorative class sample.

また、ＣｅｎｔｅｒＬｏｓｓによるポジティブサンプル（顔）とネガティブサンプル（非顔）とのデータバランス要求の考慮に基づいて、本発明の実施例は、設定比例を１:１とすることができ、即ち、第１の数と第２の数が同じである。 Also, based on consideration of the data balance requirement of the positive sample (face) and the negative sample (non-face) by Center Loss, the embodiment of the present invention can set the set proportion to 1: 1. The number 1 and the second number are the same.

ステップＳ３３０、ポジティブクラスのサンプルクラスから選択されたトレーニングサンプルと、ネガティブクラスのサンプルクラスから選択されたトレーニングサンプルによって、今回の反復のバッチデータトレーニングサンプルを構成する。 Step S330, the training sample selected from the sample class of the positive class and the training sample selected from the sample class of the negative class constitute a batch data training sample of the current iteration.

このように、本発明の実施例によって提供される顔検出トレーニング方法では、前回の反復のバッチデータトレーニングサンプルを顔検出モデルに送ってトレーニングした後、前回の反復のバッチデータトレーニングサンプルのＣｅｎｔｅｒＬｏｓｓとＳｏｆｔｍａｘＬｏｓｓに基づき、顔検出を更新して最適化し、前回の反復のバッチデータトレーニングサンプルのＳｍｏｏｔｈＬ１Ｌｏｓｓに基づき、顔回帰器を更新して最適化することによって、顔検出がＣｅｎｔｅｒＬｏｓｓ、ＳｏｆｔｍａｘＬｏｓｓとＳｍｏｏｔｈＬ１Ｌｏｓｓとの最小加重合計に最適化する。 As described above, in the face detection training method provided by the embodiment of the present invention, after sending the batch data training sample of the previous iteration to the face detection model for training, the Center Loss of the batch data training sample of the previous iteration and Based on Softmax Loss, the face detection is updated and optimized, and based on the SmoothL1 Loss of the batch data training sample of the previous iteration, the face regressor is updated and optimized, so that the face detection is performed in the Center Loss, Softmax Loss, Optimize for minimum weighted sum with SmoothL1 Loss.

前回の反復は、次回の反復で使用されるバッチデータトレーニングサンプルを確定することができ、前回反復された顔検出モデルによって、トレーニングサンプル集合における各トレーニングサンプルのＣｅｎｔｅｒＬｏｓｓ、ＳｏｆｔｍａｘＬｏｓｓ、及びＳｍｏｏｔｈＬ１Ｌｏｓｓの目標損失値を確定することができ、それにより、最大の目標損失値の第１の数のポジティブクラスのトレーニングサンプルと、最大の目標損失値の第２の数のネガティブクラスのトレーニングサンプルをトレーニングサンプル集合から選択して、次回の反復のＭｉｎｉｂａｔｃｈ（即ち、バッチデータトレーニングサンプル）が構築される。 The previous iteration can determine the batch data training samples to be used in the next iteration, and the last iteration of the face detection model will determine the Center Loss, Softmax Loss, and SmoothL1 Loss of each training sample in the training sample set. A target loss value may be determined, whereby a training sample of a first number of positive classes with a maximum target loss value and a training sample of a second number of negative classes with a maximum target loss value are trained. Choosing from the set, the next iteration's Minibatch (ie, batch data training samples) is constructed.

それにより、次回の反復に進み、次回の反復では、当該Ｍｉｎｉｂａｔｃｈを顔検出モデルに送ってトレーニングし、ある反復で、バッチデータトレーニングサンプルのＳｍｏｏｔｈＬ１Ｌｏｓｓ、ＳｏｆｔｍａｘＬｏｓｓ、及びＳｍｏｏｔｈＬ１Ｌｏｓｓの加重合計が最小になるまで、循環反復してトレーニングする。 Accordingly, the process proceeds to the next iteration, and in the next iteration, the Minibatch is sent to the face detection model for training. In a certain iteration, the weighted sum of the smooth L1 Loss, the Softmax Loss, and the Smooth L1 Loss of the batch data training sample is minimized. Until then, repeat the training.

上述したトレーニングサンプル集合では、前回反復トレーニングされた顔検出で検出されにくいトレーニングサンプルは次回の反復で使用されるＭｉｎｉｂａｔｃｈとして使用され、これにより、毎回の反復でｃｅｎｔｅｒｌｏｓｓの推定をよりよく行うことができるので、トレーニングサンプルにおいてクラス内の差異の識別力を有する特徴をよりよく監視学習することができる。 In the training sample set described above, the training sample that is hardly detected by the face detection that was previously repeatedly trained is used as the Minibatch used in the next iteration, which makes it possible to better estimate the center loss in each iteration. Therefore, it is possible to better monitor and learn a feature having a discriminating power of a difference within a class in a training sample.

ここで説明する必要があるのは、確率勾配降下アルゴリズムを使用して顔検出を行う従来の反復トレーニングとは異なり、本発明の実施例は、単にランダムに抽出されたバッチデータトレーニングサンプル（Ｍｉｎｉｂａｔｃｈ）を使用して勾配降下最適化を行うことではなく、前回の反復における、トレーニングサンプル集合のうち検出されにくいトレーニングサンプルを組み合わせて次回の反復で使用するＭｉｎｉｂａｔｃｈを確定する。 What needs to be described here is that, unlike conventional iterative training, which uses a stochastic gradient descent algorithm to perform face detection, embodiments of the present invention simply use randomly sampled batch data training samples (Minibatch). Is not used to perform gradient descent optimization. In the previous iteration, a mini-batch to be used in the next iteration is determined by combining training samples that are hard to be detected from the training sample set.

これにより分かるように、本発明の実施例は、ロバストな顔検出トレーニング方法を提供する。当該方法は、ニューラルネットワークによって実現され、反復トレーニングの各トレーニング処理では、顔と非顔の２つのカテゴリのタスクの補助損失関数として、バッチデータトレーニングサンプルに対応するＣｅｎｔｅｒＬｏｓｓ（中心損失値）を導入して、バッチデータトレーニングサンプルに対応するＳｏｆｔｍａｘＬｏｓｓ（分類損失値）と共に顔検出の最適化トレーニングを監視し、顔検出の学習処理を指導することができ、これによって、顔検出は、顔と非顔とのクラス間の差異区別可能を維持しながら、顔と顔との間のクラス内の差異を減少し、顔に対する顔検出の判別能力を向上させる。 As can be seen, embodiments of the present invention provide a robust face detection training method. The method is realized by a neural network, and in each training process of the iterative training, a Center Loss (central loss value) corresponding to a batch data training sample is introduced as an auxiliary loss function of tasks of two categories, face and non-face. Then, the face detection optimization training can be monitored together with the Softmax Loss (classification loss value) corresponding to the batch data training sample, and the face detection learning process can be guided, whereby the face detection can be performed on the face and non-face. The present invention reduces differences within a class between a face and a face while maintaining the ability to distinguish the difference between the face and the class, and improves the discrimination ability of face detection for the face.

そして、困難サンプルオンラインマイニングアルゴリズム（ＯＨＥＭ）を利用して、トレーニングサンプルの総損失値に基づき、前回のトレーニングにおいて、検出されにくいポジティブクラストレーニングサンプルとネガティブクラストレーニングサンプルをマイニングして、ポジティブ・ネガティブサンプルの比例を１:１に保ち、これにより、検出されにくいトレーニングサンプルに対する顔検出の分類能力を増強し、顔検出の全体的な性能を向上させる。 Then, using the difficult sample online mining algorithm (OHEM), based on the total loss value of the training sample, the positive class training sample and the negative class training sample which are hard to detect in the previous training are mined, and the positive / negative sample is mined. Is maintained at 1: 1, which enhances the classification capabilities of face detection for training samples that are difficult to detect, and improves the overall performance of face detection.

なお、本発明は、顔の目標により適するアンカー枠（複数のサイズ、複数のアスペクト比をカバーする）及びマルチスケールのトレーニング戦略を採用して、異なる解像度の顔目標に対する判別性を向上させ、候補枠の生成を異なる顔に適したものにすることができ、本発明の実施例によって提供された顔検出トレーニング方法によってトレーニングされた顔検出は、正確率を効果的に向上し、ロバスト性を増強することができる。本発明の実施例の顔検出と他の方式によってトレーニングされる顔検出との性能比較は以下の表１に示す通りである。 It should be noted that the present invention employs an anchor frame (covering multiple sizes and multiple aspect ratios) and a multi-scale training strategy that are more suitable for the facial target to improve discrimination against facial targets of different resolutions, The frame generation can be made suitable for different faces, and face detection trained by the face detection training method provided by the embodiment of the present invention can effectively improve the accuracy rate and enhance the robustness can do. The performance comparison between the face detection according to the embodiment of the present invention and the face detection trained by another method is as shown in Table 1 below.

これにより分かるように、本発明の実施例は、顔検出の顔検出判別能力を向上させ、顔検出のロバスト性を向上させることができる。 As can be seen from this, the embodiment of the present invention can improve the face detection discrimination ability of the face detection, and can improve the robustness of the face detection.

本発明の実施例によって提供される顔検出トレーニング装置を以下説明し、後述する顔検出トレーニング装置の内容は、本発明の実施例にって提供される顔検出トレーニング方法を実施するために、顔検出トレーニングを実施するための電子機器に必要なプログラムモジュールであるとみなすことができ、後述する顔検出トレーニング装置の内容は、上述した顔検出トレーニング方法の内容と互いに対応して参照することができる。 The face detection training apparatus provided by the embodiment of the present invention will be described below, and the contents of the face detection training apparatus described later will be described in detail in order to implement the face detection training method provided by the embodiment of the present invention. It can be regarded as a program module necessary for the electronic device for performing the detection training, and the content of the face detection training device described later can be referred to in correspondence with the content of the face detection training method described above. .

図８は、本発明の実施例によって提供される顔検出トレーニング装置の構成ブロック図であり、図８を参照すると、当該顔検出トレーニング装置は、
今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれるサンプル取得モジュール１００と、
各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定するためのサンプル中心損失値確定モジュール２００と、
前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定するためのバッチサンプル中心損失値確定モジュール３００と、
少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定するための検出目標損失値確定モジュール４００と、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、前記顔検出の目標損失値に基づき、顔検出モデルのネットワークパラメータを更新して、次回の反復に進むためのパラメータ更新モジュール５００と、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出を出力するための検出出力モジュール６００と、
を含むことができる。 FIG. 8 is a block diagram illustrating a configuration of a face detection training apparatus provided by an embodiment of the present invention. Referring to FIG.
Obtaining a batch data training sample of the current iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes;
A sample center loss value determination module 200 for determining a center loss value corresponding to each training sample based on a feature vector of each training sample and a center feature vector of a sample class to which each training sample belongs;
A batch sample center loss value determination module 300 for determining a center loss value corresponding to the batch data training sample based on the center loss value corresponding to each of the training samples;
A detection target loss value determination module 400 for determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample;
If the target loss value of the face detection does not reach the set training convergence condition, the network parameter of the face detection model is updated based on the target loss value of the face detection, and the parameter is updated to proceed to the next iteration. Module 500;
A detection output module 600 for outputting face detection when the target loss value for face detection reaches a set training convergence condition;
Can be included.

また、検出目標損失値確定モジュール４００は、少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定し、具体的に、
前記バッチデータトレーニングサンプルに対応する中心損失値、前記バッチデータトレーニングサンプルに対応する分類損失値、及び前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値に基づき、顔検出の目標損失値を確定すること、を含んでもよい。 The detection target loss value determination module 400 determines a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample.
Determine a target loss value for face detection based on the central loss value corresponding to the batch data training sample, the classification loss value corresponding to the batch data training sample, and the face frame coordinate regression loss value corresponding to the batch data training sample. May be included.

また、検出目標損失値確定モジュール４００は、前記バッチデータトレーニングサンプルに対応する中心損失値、前記バッチデータトレーニングサンプルに対応する分類損失値、及び前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値に基づき、顔検出の目標損失値を確定し、具体的に、
前記バッチデータトレーニングサンプルに対応する中心損失値と第１の設定重みとの積、前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値と第２の設定重みとの積、及び、前記バッチデータトレーニングサンプルに対応する分類損失値を合計して、顔検出の目標損失値を得ることを含んでもよい。 Further, the detection target loss value determination module 400 includes a central loss value corresponding to the batch data training sample, a classification loss value corresponding to the batch data training sample, and a face frame coordinate regression loss value corresponding to the batch data training sample. Based on the target loss value of face detection is determined, specifically,
A product of a central loss value corresponding to the batch data training sample and a first set weight, a product of a face frame coordinate regression loss value corresponding to the batch data training sample and a second set weight, and the batch data This may include summing the classification loss values corresponding to the training samples to obtain a target loss value for face detection.

また、サンプル取得モジュール１００は、今回の反復のバッチデータトレーニングサンプルを取得し、具体的に、
前回の反復の顔検出モデルで、トレーニングサンプル集合における各トレーニングサンプルに対応する目標損失値を確定することと、
トレーニングサンプル集合におけるポジティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ポジティブクラスのサンプルクラスのうち最大の目標損失値の第１の数のトレーニングサンプルを選択し、トレーニングサンプル集合におけるネガティブクラスのサンプルクラスのうち各トレーニングサンプルの目標損失値に基づき、ネガティブクラスのサンプルクラスのうち最大の目標損失値の第２の数のトレーニングサンプルを選択し、前記第２の数に対する前記第１の数の比値が設定された比例に対応することと、
ポジティブクラスのサンプルクラスから選択されたトレーニングサンプルと、ネガティブクラスのサンプルクラスから選択されたトレーニングサンプルによって、今回の反復のバッチデータトレーニングサンプルを構成することとを含んでもよい。 The sample acquisition module 100 acquires a batch data training sample of the current iteration, and specifically,
Determining the target loss value corresponding to each training sample in the training sample set with the face detection model of the previous iteration;
Based on the target loss value of each training sample among the sample classes of the positive class in the training sample set, selecting the first number of training samples having the largest target loss value among the sample classes of the positive class, Selecting a second number of training samples having the largest target loss value among the sample classes of the negative class based on the target loss value of each training sample among the sample classes of the class, and selecting the first number of training samples for the second number; That the ratio value of the number corresponds to the set proportion,
The training sample selected from the sample class of the positive class and the training sample selected from the sample class of the negative class may constitute a batch data training sample of the current iteration.

また、サンプル取得モジュール１００は、前回の反復の顔検出モデルで、トレーニングサンプル集合における各トレーニングサンプルに対応する目標損失値を確定し、具体的に、
前回の反復の顔検出モデルで、トレーニングサンプル集合における各トレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値を取得し、その中、トレーニングサンプルに対応する分類損失値は当該トレーニングサンプルに対応する分類予測確率と分類目標確率に基づき確定し、トレーニングサンプルに対応する顔枠座標回帰損失値は当該トレーニングサンプルに対応する顔枠予測座標と顔枠目標座標に基づき確定することと、
トレーニングサンプル集合における各トレーニングサンプルに対応する中心損失値、分類損失値、及び顔枠座標回帰損失値に基づき、トレーニングサンプル集合における各トレーニングサンプルの目標損失値を確定することと、を含んでもよい。 In addition, the sample acquisition module 100 determines the target loss value corresponding to each training sample in the training sample set by the face detection model of the previous iteration, and specifically,
In the face detection model of the previous iteration, the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to each training sample in the training sample set are obtained, among which the classification loss value corresponding to the training sample is Determined based on the classification prediction probability and the classification target probability corresponding to the training sample, and the face frame coordinate regression loss value corresponding to the training sample is determined based on the face frame prediction coordinate and the face frame target coordinate corresponding to the training sample. When,
Determining a target loss value for each training sample in the training sample set based on the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to each training sample in the training sample set.

また、図９は、本発明の実施例によって提供される顔検出トレーニング装置の他の構成を示し、図８及び図９を参照すると、当該顔検出トレーニング装置は、
前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類予測確率と分類目標確率に基づき、前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類損失値を確定し、前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する分類損失値に基づき、前記バッチデータトレーニングサンプルに対応する分類損失値を確定するための、バッチサンプル分類損失値確定モジュール７００をさらに含んでもよい。 FIG. 9 shows another configuration of the face detection training apparatus provided by the embodiment of the present invention. Referring to FIGS. 8 and 9, the face detection training apparatus includes:
Determining a classification loss value corresponding to each training sample in the batch data training sample based on a classification prediction probability and a classification target probability corresponding to each training sample in the batch data training sample; May further include a batch sample classification loss value determination module 700 for determining a classification loss value corresponding to the batch data training sample based on the classification loss value corresponding to.

また、サンプル中心損失値確定モジュール２００は、各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定し、具体的に、
前記バッチデータトレーニングサンプルにおける各トレーニングサンプルの特徴ベクトル、及び前記バッチデータトレーニングサンプルにおける各サンプルクラスの中心特徴ベクトルを確定することと、
前記バッチデータトレーニングサンプルのうち一つのトレーニングサンプルに対して、当該トレーニングサンプルの特徴ベクトルと、前記バッチデータトレーニングサンプルにおける当該トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルとの距離を確定して、当該トレーニングサンプルに対応する中心損失値を得ること、を含んでもよい。 Further, the sample center loss value determination module 200 determines the center loss value corresponding to each training sample based on the feature vector of each training sample and the center feature vector of the sample class to which each training sample belongs.
Determining a feature vector of each training sample in the batch data training sample, and a central feature vector of each sample class in the batch data training sample;
For one training sample of the batch data training samples, determine a distance between a feature vector of the training sample and a center feature vector of a sample class to which the training sample belongs in the batch data training sample, and perform the training. Obtaining a center loss value corresponding to the sample.

また、サンプル中心損失値確定モジュール２００は、前記バッチデータトレーニングサンプルにおける各サンプルクラスの中心特徴ベクトルを確定し、具体的に、
一つのサンプルクラスに対して、前記バッチデータトレーニングサンプルにおける当該サンプルクラスに属する各トレーニングサンプルを確定することと、
前記バッチデータトレーニングサンプルにおける当該サンプルクラスに属する各トレーニングサンプルの特徴ベクトルに基づき、当該サンプルクラスに属する各トレーニングサンプルの特徴ベクトルの平均値を確定して、前記バッチデータトレーニングサンプルにおける当該サンプルクラスの中心特徴ベクトルの更新変数を得ることと、
前記更新変数と設定された学習率に基づき、前記バッチデータトレーニングサンプルにおける当該サンプルクラスの中心特徴ベクトルを得ることと、を含んでもよい。 Further, the sample center loss value determination module 200 determines a center feature vector of each sample class in the batch data training sample, and specifically,
For one sample class, determining each training sample belonging to the sample class in the batch data training sample;
Based on the feature vector of each training sample belonging to the sample class in the batch data training sample, determine the average value of the feature vector of each training sample belonging to the sample class, and determine the center of the sample class in the batch data training sample. Obtaining the update variable of the feature vector,
Obtaining a central feature vector of the sample class in the batch data training sample based on the update variable and the set learning rate.

また、図１０は、本発明の実施例によって提供される顔検出トレーニング装置の別の構成を示し、図９と図１０を参照すると、当該顔検出トレーニング装置は、
候補枠回帰器に基づき、各前記バッチデータトレーニングサンプルにおける各トレーニングサンプルに対応する顔枠予測座標を確定し、各トレーニングサンプルに対応する顔枠予測座標、及び各トレーニングサンプルに対応する顔枠目標座標に基づき、各トレーニングサンプルに対応する顔枠座標回帰損失値を確定し、各トレーニングサンプルに対応する顔枠座標回帰損失値に基づき、前記バッチデータトレーニングサンプルに対応する顔枠座標回帰損失値を確定するための、バッチサンプル顔枠座標回帰損失値確定モジュール８００をさらに含んでもよい。 FIG. 10 shows another configuration of the face detection training apparatus provided by the embodiment of the present invention. Referring to FIGS. 9 and 10, the face detection training apparatus comprises:
Based on the candidate frame regressor, the face frame predicted coordinates corresponding to each training sample in each of the batch data training samples are determined, the face frame predicted coordinates corresponding to each training sample, and the face frame target coordinates corresponding to each training sample. , A face frame coordinate regression loss value corresponding to each training sample is determined, and a face frame coordinate regression loss value corresponding to the batch data training sample is determined based on the face frame coordinate regression loss value corresponding to each training sample. The method may further include a batch sample face frame coordinate regression loss value determination module 800 for performing the process.

また、パラメータ更新モジュール５００は、前記顔検出の目標損失値に基づき、顔検出モデルのネットワークパラメータを更新し、具体的に、
顔検出の目標損失値に基づき、バックプロパゲーションにより、顔検出モデルにおけるネットワークパラメータを更新することを含んでもよい。 Further, the parameter updating module 500 updates the network parameters of the face detection model based on the target loss value of the face detection.
The method may include updating network parameters in the face detection model by back propagation based on the target loss value of face detection.

また、パラメータ更新モジュール５００は、顔検出の目標損失値に基づき、バックプロパゲーションにより、顔検出モデルにおけるネットワークパラメータを更新し、具体的に、
前記顔検出の目標損失値、及び前回の反復の顔検出モデルにおけるネットワークパラメータに基づき、顔検出のパラメータ更新値を確定することと、
当該顔検出のパラメータ更新値に基づき、前回の反復の顔検出モデルにおけるネットワークパラメータを更新することと、を含んでもよい。 The parameter update module 500 updates the network parameters in the face detection model by back propagation based on the target loss value of the face detection.
Based on the target loss value of the face detection, and a network parameter in the face detection model of the previous iteration, to determine a parameter update value of the face detection,
Updating the network parameters in the face detection model of the previous iteration based on the parameter update value of the face detection.

また、本発明の実施例によって提供される顔検出トレーニング装置は、さらに、
異なる縮尺とアスペクト比をカバーする複数のアンカー枠を予め定義し、当該予め定義された前記複数のアンカー枠によって、トレーニングサンプルにおけるサブ枠を確定し、前記サブ枠によって候補枠を予測する。 Further, the face detection training apparatus provided by the embodiment of the present invention further includes:
A plurality of anchor frames covering different scales and aspect ratios are defined in advance, a sub-frame in a training sample is determined by the plurality of predefined anchor frames, and a candidate frame is predicted by the sub-frame.

本発明の実施例は電子機器をさらに提供し、当該電子機器のハードウェア構成は図３に示すように、少なくとも一つのメモリと少なくとも一つのプロセッサーを含み、
前記メモリにはプログラムが記憶され、前記プロセッサーは前記プログラムを呼び出して、前記プログラムにより、
今回の反復のバッチデータトレーニングサンプルを取得し、前記バッチデータトレーニングサンプルには異なるサンプルクラスの複数のトレーニングサンプルが含まれ、
各トレーニングサンプルの特徴ベクトル、及び各トレーニングサンプルが属するサンプルクラスの中心特徴ベクトルに基づき、各トレーニングサンプルに対応する中心損失値を確定し、
前記各トレーニングサンプルに対応する中心損失値に基づき、前記バッチデータトレーニングサンプルに対応する中心損失値を確定し、
少なくとも前記バッチデータトレーニングサンプルに対応する中心損失値に基づき、顔検出の目標損失値を確定し、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達していないと、前記顔検出の目標損失値に基づき、顔検出モデルのネットワークパラメータを更新して、次回の反復に進み、
前記顔検出の目標損失値が設定されたトレーニング収束条件に達すると、顔検出のトレーニング結果を出力する。 Embodiments of the present invention further provide an electronic device, and a hardware configuration of the electronic device includes at least one memory and at least one processor, as shown in FIG.
A program is stored in the memory, and the processor calls the program, and according to the program,
Obtaining a batch data training sample of this iteration, wherein said batch data training sample includes a plurality of training samples of different sample classes;
Determining a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs;
Determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each of the training samples;
Determining a target loss value for face detection based on at least the central loss value corresponding to the batch data training sample;
If the target loss value of the face detection has not reached the set training convergence condition, based on the target loss value of the face detection, update the network parameters of the face detection model, proceed to the next iteration,
When the target loss value of the face detection reaches the set training convergence condition, the training result of the face detection is output.

本明細書における各実施例は漸進的に記載されており、各実施例は他の実施形態との相違点について主に説明し、各実施例間の同じ又は類似の部分は互いに参照され得る。実施例に開示された装置は、実施例に開示された方法に対応するので、説明は比較的単純であり、関連部分は方法のセクションの説明を参照することができる。 Each example herein is described progressively, with each example primarily describing differences from the other embodiments, and the same or similar parts between each example may be referenced to each other. Since the device disclosed in the embodiment corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can refer to the description in the method section.

当業者はさらに以下のことを理解することができ、本明細書に開示されている実施例に関連して説明されている各例示的なユニット及びアルゴリズムステップは、電子ハードウェア、コンピュータソフトウェア、又はそれらの組み合わせによって実現することができ、ハードウェアとソフトウェアの互換性を明確に説明するために、各例示的な構成要素及びステップは、上記の説明では機能の観点から一般的に説明されている。これらの機能がハードウェアで実行されるかソフトウェアで実行されるかは、技術的解決策の特定のアプリケーション及びソリューションの設計上の制約に依存する。当業者であれば、特定の用途ごとに、説明した機能を実現するために異なる方法を使用することができるが、そのような実現は本発明の範囲を超えると見なされるべきではない。 One skilled in the art can further understand the following, wherein each exemplary unit and algorithm step described in connection with the embodiments disclosed herein may be implemented in electronic hardware, computer software, or To clarify hardware and software compatibility, which can be realized by a combination thereof, each exemplary component and step has been described generally in terms of functionality in the above description. . Whether these functions are performed in hardware or software depends on the particular application of the technical solution and the design constraints of the solution. Those skilled in the art will be able to use different ways to implement the described functionality for a particular application, but such implementation should not be considered beyond the scope of the present invention.

本明細書に開示されている実施例に関連して説明されている方法又はアルゴリズムのステップは、ハードウェア、プロセッサによって実行されるソフトウェアモジュール、又はそれらの組み合わせで直接実施することができる。ソフトウェアモジュールは、ランダムアクセスメモリ（ＲＡＭ）、メモリ、読み出し専用メモリ（ＲＯＭ）、電気的にプログラム可能なＲＯＭ、電気的に消去可能なプログラム可能なＲＯＭ、レジスタ、ハードディスク、リムーバブルディスク、ＣＤ - ＲＯＭ、又は当該技術分野で知られている既知の任意の他の形態の記憶媒体に配置することができる。 The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented directly in hardware, software modules executed by a processor, or a combination thereof. The software modules include a random access memory (RAM), a memory, a read only memory (ROM), an electrically programmable ROM, an electrically erasable programmable ROM, a register, a hard disk, a removable disk, a CD-ROM, Or it can be located in any other form of storage medium known in the art.

開示された実施例の上記説明は、当業者が本発明を実現又は使用することを可能にする。これらの実施例に対する様々な修正は当業者に明らかであり、本明細書で定義された一般的な原理は本発明の精神又は範囲から逸脱することなく他の実施例において実現することができる。したがって、本発明は本明細書に示されたこれらの実施例に限定されるべきではなく、本明細書に開示されている原理及び新規の特徴と一致している最も広い範囲にある。 The above description of the disclosed embodiments allows those skilled in the art to make or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention should not be limited to these embodiments shown herein, but is in its broadest scope consistent with the principles and novel features disclosed herein.

１００サンプル取得モジュール
２００サンプル中心損失値確定モジュール
３００バッチサンプル中心損失値確定モジュール
４００検出目標損失値確定モジュール
５００パラメータ更新モジュール
６００検出出力モジュール
７００バッチサンプル分類損失値確定モジュール
８００バッチサンプル顔枠座標回帰損失値確定モジュール 100 Sample acquisition module 200 Sample center loss value determination module 300 Batch sample center loss value determination module 400 Detection target loss value determination module 500 Parameter update module 600 Detection output module 700 Batch sample classification loss value determination module 800 Batch sample face frame coordinate regression loss Value determination module

Claims

A face detection training method,
Obtaining a batch data training sample of the iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes;
Determining a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs;
Determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each of the training samples;
Determining a target loss value for face detection based on the central loss value corresponding to the batch data training sample;
When the target loss value of the face detection reaches a set training convergence condition, outputting a face detection training result,
A face detection training method comprising:

The method comprises:
If the target loss value of the face detection has not reached the set training convergence condition, based on the target loss value of the face detection, update the network parameters of the face detection model, and proceed to the next iteration,
The face detection training method according to claim 1, further comprising:

Determining a target loss value for face detection based on the central loss value corresponding to the batch data training sample described above,
Determine a target loss value for face detection based on the central loss value corresponding to the batch data training sample, the classification loss value corresponding to the batch data training sample, and the face frame coordinate regression loss value corresponding to the batch data training sample. To do,
3. The face detection training method according to claim 1, further comprising:

Based on the central loss value corresponding to the batch data training sample, the classification loss value corresponding to the batch data training sample, and the face frame coordinate regression loss value corresponding to the batch data training sample, a target loss value for face detection. To determine
A product of a central loss value corresponding to the batch data training sample and a first set weight, a product of a face frame coordinate regression loss value corresponding to the batch data training sample and a second set weight, and the batch data Summing the classification loss values corresponding to the training samples to obtain a target loss value for face detection;
The face detection training method according to claim 3, further comprising:

The sample class of the plurality of training samples includes a positive class and a negative class, and obtaining the batch data training sample of the above-described iteration includes:
Determining the target loss value corresponding to each training sample in the training sample set in the model of the previous iteration;
Based on the target loss value of each training sample among the sample classes of the positive class in the training sample set, selecting the first number of training samples having the largest target loss value among the sample classes of the positive class, Selecting a second number of training samples having the largest target loss value among the sample classes of the negative class based on the target loss value of each training sample among the sample classes of the class, and selecting the first number of training samples for the second number; That the ratio of the numbers corresponds to the set proportion,
Constructing a batch data training sample for this iteration with the training sample selected from the positive class sample class and the training sample selected from the negative class sample class;
The face detection training method according to any one of claims 1 to 4, further comprising:

Determining the target loss value corresponding to each training sample in the training sample set with the model of the previous iteration described above,
In the model of the previous iteration, a central loss value, a classification loss value, and a face frame coordinate regression loss value corresponding to each training sample in the training sample set are obtained, wherein the classification loss value corresponding to the training sample is the aforementioned The classification prediction probability and the classification target probability corresponding to the training sample are determined, and the face frame coordinate regression loss value corresponding to the training sample is determined based on the face frame prediction coordinate and the face frame target coordinate corresponding to the training sample. That
Determining a target loss value for each training sample in the training sample set based on the central loss value, the classification loss value, and the face frame coordinate regression loss value corresponding to each training sample in the training sample set;
The face detection training method according to claim 5, further comprising:

The determination process of the classification loss value corresponding to the batch data training sample,
Determining a classification loss value corresponding to each training sample in the batch data training sample based on a classification prediction probability and a classification target probability corresponding to each training sample in the batch data training sample;
Determining a classification loss value corresponding to the batch data training sample based on a classification loss value corresponding to each training sample in the batch data training sample;
The face detection training method according to claim 3, further comprising:

Confirmation processing of the face frame coordinate regression loss value corresponding to the batch data training sample,
Determining face frame prediction coordinates corresponding to each training sample in the batch data training sample;
Based on the face frame predicted coordinates corresponding to each training sample, and the face frame target coordinates corresponding to each training sample, determining a face frame coordinate regression loss value corresponding to each training sample;
Determining a face frame coordinate regression loss value corresponding to the batch data training sample based on the face frame coordinate regression loss value corresponding to each training sample;
The face detection training method according to claim 3, further comprising:

Based on the feature vector of each training sample and the center feature vector of the sample class to which each training sample belongs, determining the central loss value corresponding to each training sample,
Determining a feature vector of each training sample in the batch data training sample, and a central feature vector of each sample class in the batch data training sample;
For a training sample in the batch data training sample, determine a distance between a feature vector of the training sample and a central feature vector of a sample class to which the training sample belongs in the batch data training sample, and correspond to the training sample. To obtain a central loss value
The face detection training method according to claim 1, comprising:

Determining the central feature vector of each sample class in the batch data training sample described above,
For each sample class, determining each training sample belonging to the sample class in the batch data training sample;
Based on the feature vector of each training sample belonging to the sample class in the batch data training sample, determine the average value of the feature vector of each training sample belonging to the sample class, and determine the center of the sample class in the batch data training sample. Obtaining the update variable of the feature vector,
Obtaining a central feature vector of the sample class in the batch data training sample based on the update rate and the set learning rate;
10. The face detection training method according to claim 9, comprising:

Updating network parameters in the model based on the target loss value of the face detection described above,
Updating network parameters in the face detection model by back propagation based on the target loss value of face detection;
The face detection training method according to any one of claims 2 to 4, comprising:

Updating network parameters in the face detection model by back propagation based on the target loss value of the face detection described above,
Based on the target loss value of the face detection, and a network parameter in the face detection model of the previous iteration, to determine a parameter update value of the face detection,
Updating the network parameters in the face detection model of the previous iteration based on the parameter update value of the face detection;
The face detection training method according to claim 11, further comprising:

The method comprises:
Pre-defining a plurality of anchor frames covering different scales and aspect ratios,
Determining a sub-frame in a training sample by the plurality of predefined anchor frames, and predicting a candidate frame by the sub-frame;
The face detection training method according to claim 1, further comprising:

A sample acquisition module for acquiring batch data training samples comprising a plurality of training samples of different sample classes of the iteration;
A sample center loss value determination module for determining a center loss value corresponding to each training sample based on a feature vector of each training sample and a center feature vector of a sample class to which each training sample belongs;
A batch sample center loss value determination module for determining a center loss value corresponding to the batch data training sample based on the center loss value corresponding to each of the training samples;
A detection target loss value determination module for determining a target loss value for face detection based on the central loss value corresponding to the batch data training sample;
When the target loss value of the face detection reaches the set training convergence condition, a detection output module for outputting a training result of the face detection,
A face detection training apparatus comprising:

The face detection training device,
If the target loss value of the face detection does not reach the set training convergence condition, the network parameter of the face detection model is updated based on the target loss value of the face detection, and the parameter is updated to proceed to the next iteration. The face detection training apparatus according to claim 14, further comprising a module.

The sample acquisition module, specifically,
In the face detection model of the previous iteration, determine the target loss value corresponding to each training sample in the training sample set,
Based on the target loss value of each training sample among the sample classes of the positive class in the training sample set, selecting the first number of training samples having the largest target loss value among the sample classes of the positive class, Selecting a second number of training samples having the largest target loss value among the sample classes of the negative class based on the target loss value of each training sample among the sample classes of the class, and selecting the first number of training samples for the second number; The ratio of the numbers corresponds to the set proportion,
The training sample selected from the sample class of the positive class and the training sample selected from the sample class of the negative class constitute a batch data training sample of this iteration.
The face detection training apparatus according to claim 14, wherein:

An electronic device including a memory and a processor,
A program is stored in the memory, and the processor calls the program, and according to the program,
Obtaining a batch data training sample of the iteration, wherein the batch data training sample includes a plurality of training samples of different sample classes;
Determining a central loss value corresponding to each training sample based on a feature vector of each training sample and a central feature vector of a sample class to which each training sample belongs;
Determining a central loss value corresponding to the batch data training sample based on the central loss value corresponding to each of the training samples;
Determining a target loss value for face detection based on the central loss value corresponding to the batch data training sample;
When the target loss value of the face detection reaches a set training convergence condition, a face detection training result is output,
Electronic equipment characterized by the above.

14. A computer readable storage medium containing instructions, wherein the instructions, when executed on a computer, cause the computer to perform the method according to any one of claims 1 to 13.