JP7376731B2

JP7376731B2 - Image recognition model generation method, device, computer equipment and storage medium

Info

Publication number: JP7376731B2
Application number: JP2022564577A
Authority: JP
Inventors: クイ、ジェクァン; リュウ、シュ; ティアン、チュオタオ
Original assignee: Shenzhen Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd
Priority date: 2020-08-25
Filing date: 2021-07-16
Publication date: 2023-11-08
Anticipated expiration: 2041-07-16
Also published as: WO2022042123A1; CN111950656A; JP2023523029A; CN111950656B

Description

（関連出願の相互参照）
本願は、２０２０年８月２５日に提出された、名称が「画像認識モデル生成方法、装置、コンピュータ機器及び記憶媒体」、出願番号が２０２０１０８６２９１１０である中国特許出願の優先権を主張し、その全ての内容が参照によって本願に組み込まれる。 (Cross reference to related applications)
This application claims priority to a Chinese patent application filed on August 25, 2020, titled "Image recognition model generation method, device, computer equipment, and storage medium" and application number 2020108629110, and all The contents of this application are incorporated by reference into this application.

本願は、画像認識モデル生成方法、装置、コンピュータ機器及び記憶媒体に関する。 The present application relates to an image recognition model generation method, apparatus, computer equipment, and storage medium.

深層学習において、画像認識技術は非常に大きな進歩を遂げた。しかし、これらの進歩には、ＩｍａｇｅＮｅｔ、ＣＯＣＯなどの大規模なデータセットが欠かせない。一般的な場合、これらの大規模なデータセットはクラスのバランスがとれているが、現実では、我々が得られるデータは通常、小さなクラスに含まれる画像データが多く、大きなクラスに含まれる画像データが少ないというロングテール分布に従うものである。 Image recognition technology has made tremendous progress in deep learning. However, large-scale datasets such as ImageNet and COCO are essential to these advances. In the general case, these large datasets are class balanced, but in reality, the data we get usually has more image data in small classes and less image data in large classes. It follows a long-tail distribution in which there are few.

複数の実施例によれば、本願の第１態様は、
同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するステップと、
前記サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、前記訓練対象の画像認識モデルの損失値を得るステップであって、前記訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、前記損失値は、目標分類損失値と、各前記ブランチニューラルネットワークに対応する分類損失値とを含み、前記目標分類損失値は、前記訓練対象の画像認識モデルの前記サンプル画像セットに対する損失値であり、各前記ブランチニューラルネットワークに対応する前記分類損失値は、対応するブランチニューラルネットワークの前記ブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であるステップと、
前記損失値に基づいて、前記損失値が予め設定された閾値よりも低くなるまで前記訓練対象の画像認識モデルのモデルパラメータを調整し、前記訓練対象の画像認識モデルを訓練済みの画像認識モデルとするステップと、を含む画像認識モデル生成方法を提供する。 According to embodiments, a first aspect of the present application includes:
obtaining a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained is configured to train each corresponding image; The loss value includes a target classification loss value and a classification loss value corresponding to each of the branch neural networks, and the target classification loss value includes a plurality of branch neural networks for image recognition of the training target. a loss value for the sample image set of a model, the classification loss value corresponding to each branch neural network being a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. An image recognition model generation method is provided, which includes the steps of:

複数の実施例によれば、本願の第２態様は、
同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するための取得モジュールと、
前記サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、前記訓練対象の画像認識モデルの損失値を得るための訓練モジュールであって、前記訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、前記損失値は、目標分類損失値と、各前記ブランチニューラルネットワークに対応する分類損失値とを含み、前記目標分類損失値は、前記訓練対象の画像認識モデルの前記サンプル画像セットに対する損失値であり、各前記ブランチニューラルネットワークに対応する前記分類損失値は、対応するブランチニューラルネットワークの前記ブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値である訓練モジュールと、
前記損失値に基づいて、前記損失値が予め設定された閾値よりも低くなるまで前記訓練対象の画像認識モデルのモデルパラメータを調整し、前記訓練対象の画像認識モデルを訓練済みの画像認識モデルとするための調整モジュールと、を備える画像認識モデル生成装置を提供する。 According to embodiments, a second aspect of the present application includes:
an acquisition module for acquiring a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
A training module for training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained has a corresponding a plurality of branch neural networks for respectively recognizing images; the loss value includes a target classification loss value; and a classification loss value corresponding to each branch neural network; is a loss value for the sample image set of an image recognition model of , and the classification loss value corresponding to each branch neural network is a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network. training module;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. An image recognition model generation device is provided, including an adjustment module for.

複数の実施例によれば、本願の第３態様は、コンピュータプログラムが記憶されているメモリと、前記コンピュータプログラムを実行する場合に、
同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するステップと、
前記サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、前記訓練対象の画像認識モデルの損失値を得るステップであって、前記訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、前記損失値は、目標分類損失値と、各前記ブランチニューラルネットワークに対応する分類損失値とを含み、前記目標分類損失値は、前記訓練対象の画像認識モデルの前記サンプル画像セットに対する損失値であり、各前記ブランチニューラルネットワークに対応する前記分類損失値は、対応するブランチニューラルネットワークの前記ブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であるステップと、
前記損失値に基づいて、前記損失値が予め設定された閾値よりも低くなるまで前記訓練対象の画像認識モデルのモデルパラメータを調整し、前記訓練対象の画像認識モデルを訓練済みの画像認識モデルとするステップと、を実現するプロセッサと、を備えるコンピュータ機器を提供する。 According to embodiments, a third aspect of the present application provides a memory in which a computer program is stored, and, when executing the computer program,
obtaining a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained is configured to train each corresponding image; The loss value includes a target classification loss value and a classification loss value corresponding to each of the branch neural networks, and the target classification loss value includes a plurality of branch neural networks for image recognition of the training target. a loss value for the sample image set of a model, the classification loss value corresponding to each branch neural network being a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. Provided is a computer device comprising the steps of: and a processor for implementing the steps.

複数の実施例によれば、本願の第４態様は、プロセッサによって実行される場合に、
同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するステップと、
前記サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、前記訓練対象の画像認識モデルの損失値を得るステップであって、前記訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、前記損失値は、目標分類損失値と、各前記ブランチニューラルネットワークに対応する分類損失値とを含み、前記目標分類損失値は、前記訓練対象の画像認識モデルの前記サンプル画像セットに対する損失値であり、各前記ブランチニューラルネットワークに対応する前記分類損失値は、対応するブランチニューラルネットワークの前記ブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であるステップと、
前記損失値に基づいて、前記損失値が予め設定された閾値よりも低くなるまで前記訓練対象の画像認識モデルのモデルパラメータを調整し、前記訓練対象の画像認識モデルを訓練済みの画像認識モデルとするステップと、を実現するコンピュータプログラムが記憶されているコンピュータ可読記憶媒体を提供する。 According to embodiments, the fourth aspect of the present application, when executed by a processor:
obtaining a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained is configured to train each corresponding image; The loss value includes a target classification loss value and a classification loss value corresponding to each of the branch neural networks, and the target classification loss value includes a plurality of branch neural networks for image recognition of the training target. a loss value for the sample image set of a model, the classification loss value corresponding to each branch neural network being a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. A computer readable storage medium having a computer program stored thereon is provided.

本願の一つ又は複数の実施例の詳細は、以下の図面及び説明に記載されている。本願の他の特徴及び利点は、明細書、図面及び特許請求の範囲から明らかになるであろう。 The details of one or more implementations of the present application are set forth in the drawings and description below. Other features and advantages of the present application will be apparent from the specification, drawings, and claims.

以下、本願の実施例又は従来技術の技術的手段をより明確に説明するために、実施例又は従来技術の説明に使用する図面を簡単に紹介する。以下の説明における図面は、本願のいくつかの実施例に過ぎず、当業者であれば、創造的努力なしにこれらの図面から他の図面を導き出すこともできることは明らかである。 Hereinafter, in order to more clearly explain the technical means of the embodiments of the present application or the prior art, drawings used to explain the embodiments or the prior art will be briefly introduced. It is clear that the drawings in the following description are only some examples of the present application and that a person skilled in the art can also derive other drawings from these drawings without creative efforts.

一実施例における画像認識モデル生成方法の使用環境を示す図である。FIG. 2 is a diagram showing a usage environment of an image recognition model generation method in an embodiment. 一実施例における画像認識モデル生成方法のフローチャートである。3 is a flowchart of an image recognition model generation method in one embodiment. 一実施例におけるブランチニューラルネットワークの構造を示す図である。FIG. 3 is a diagram showing the structure of a branch neural network in one embodiment. 一実施例における訓練対象の画像認識モデルを訓練して損失値を得るステップのフローチャートである。5 is a flowchart of steps for training an image recognition model to be trained and obtaining a loss value in one embodiment. 一実施例における訓練対象の画像認識モデルの損失値を決定するステップのフローチャートである。3 is a flowchart of steps for determining a loss value for an image recognition model to be trained in one embodiment. 一実施例におけるサンプル画像サブセット及びサンプル画像セットを得る方法のフローチャートである。2 is a flowchart of a method for obtaining a sample image subset and a sample image set in one embodiment. 一実施例における画像認識モデル生成装置の構成ブロック図である。FIG. 1 is a configuration block diagram of an image recognition model generation device in one embodiment. 一実施例におけるコンピュータ機器の内部構造図である。FIG. 2 is an internal structural diagram of computer equipment in one embodiment.

このロングテール分布に適合するデータを利用してニューラルネットワークを訓練した結果、ニューラルネットワークは、画像データが多く含まれる小さなクラスをうまく認識することができるが、画像データが少なく含まれる大きなクラスを認識する精度が低いということが一般的である。これにより、画像認識モデルを生成する際に、このロングテール分布特性を無視すると、実際の使用において画像認識モデルの性能が大幅に低下してしまう。したがって、従来の画像認識モデル生成方法により得られた画像認識モデルの認識効果が依然として劣っている。 As a result of training a neural network using data that fits this long-tail distribution, the neural network can successfully recognize small classes that contain a lot of image data, but it can recognize large classes that contain little image data. It is common that the accuracy is low. Therefore, if this long tail distribution characteristic is ignored when generating an image recognition model, the performance of the image recognition model will be significantly reduced in actual use. Therefore, the recognition effect of the image recognition model obtained by the conventional image recognition model generation method is still poor.

本願の目的、技術的手段及び利点をより明確にするために、以下、図面及び実施例を参照しながら、本願を詳細に説明する。ここで説明される具体的な実施例は、本願を解釈するためのものに過ぎず、本願を限定するためのものではないことを理解されたい。 In order to make the objectives, technical means, and advantages of the present application more clear, the present application will be described in detail below with reference to drawings and examples. It is to be understood that the specific examples described herein are for purposes of interpretation only and are not intended to limit the application.

本願に係る画像認識モデル生成方法は、図１に示される使用環境に使用することができる。端末１１はネットワークを介してサーバ１２と通信する。サーバ１２は、ネットワークを介して端末１１から送信される、同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得し、サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得、訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、損失値は、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とを含み、目標分類損失値は、訓練対象の画像認識モデルのサンプル画像セットに対する損失値であり、各ブランチニューラルネットワークに対応する分類損失値は、対応するブランチニューラルネットワークのブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であり、サーバ１２は、損失値に基づいて、損失値が予め設定された閾値よりも低くなるまで訓練対象の画像認識モデルのモデルパラメータを調整し、訓練対象の画像認識モデルを訓練済みの画像認識モデルとする。端末１１は、認識対象の画像をサーバ１２に送信するとともに、サーバ１２からの認識結果を得ることができる。 The image recognition model generation method according to the present application can be used in the usage environment shown in FIG. Terminal 11 communicates with server 12 via the network. The server 12 obtains a sample image set including a plurality of sample image subsets each containing the same number of image classes and sequentially decreasing the number of images, transmitted from the terminal 11 via the network, and performs a process based on the sample image set. Then, the image recognition model to be trained is trained to obtain the loss value of the image recognition model to be trained, and the image recognition model to be trained includes a plurality of branch neural networks for recognizing each corresponding image, and the loss value of the image recognition model to be trained is obtained. The values include a target classification loss value and a classification loss value corresponding to each branch neural network, where the target classification loss value is a loss value for the sample image set of the image recognition model to be trained, and a classification loss value corresponding to each branch neural network. The corresponding classification loss value is a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network, and the server 12 determines, based on the loss value, that the loss value is lower than a preset threshold. The model parameters of the image recognition model to be trained are adjusted until the image recognition model to be trained is set as the trained image recognition model. The terminal 11 can transmit an image to be recognized to the server 12 and obtain recognition results from the server 12.

端末１１は、様々なパーソナルコンピュータ、ノートパソコン、スマートフォン、タブレット及びウェアラブルデバイスであってもよいが、これらに限定されるものではない。サーバ１２は、独立したサーバであってもよいし、又は複数のサーバからなるサーバクラスタであってもよい。 Terminal 11 may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and wearable devices. Server 12 may be an independent server or may be a server cluster of multiple servers.

一実施例において、図２に示すように、画像認識モデル生成方法を提供し、この方法を図１におけるサーバ１２に使用することを例に説明し、下記のステップ２１～ステップ２３を含む。 In one embodiment, as shown in FIG. 2, an image recognition model generation method is provided, and this method will be described using the server 12 in FIG. 1 as an example, and includes the following steps 21 to 23.

ステップ２１：同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得する。 Step 21: Obtain a sample image set including a plurality of sample image subsets each containing the same number of image classes and with a sequentially decreasing number of images.

サンプル画像セットは、全てのサンプル画像を含むデータセットであり、複数のサンプル画像サブセットからなり、各サンプル画像サブセットには１つ又は複数の画像クラスのサンプル画像が含まれ、各サンプル画像サブセットに含まれる画像クラスの数が同じであり、また、サンプル画像サブセットに含まれる画像の総数が異なり、順次減少する傾向にある。 A sample image set is a dataset that includes all sample images and consists of a plurality of sample image subsets, each sample image subset containing sample images of one or more image classes, and each sample image subset containing The number of image classes included in the sample image subsets is the same , and the total number of images included in the sample image subsets is different and tends to decrease sequentially.

例えば、サンプル画像における画像クラスＡが１００枚、画像クラスＢが８０枚、画像クラスＣが６０枚、画像クラスＤが４０枚、画像クラスＥが２０枚、画像クラスＦが１０枚であると、画像クラスＡ、Ｂは１８０枚のサンプル画像を含むサンプル画像サブセットを構成し、画像クラスＣ、Ｄは１００枚のサンプル画像を含むサンプル画像サブセットを構成し、画像クラスＥ、Ｆは３０枚のサンプル画像を含むサンプル画像サブセットを構成することができる。これにより、３つのサンプル画像サブセットは、画像の数が順次減少し、同じ数の画像クラスを含む。 For example, if the sample images include 100 images of image class A, 80 images of image class B, 60 images of image class C, 40 images of image class D, 20 images of image class E, and 10 images of image class F. Image classes A, B constitute a sample image subset containing 180 sample images, image classes C, D constitute a sample image subset containing 100 sample images, and image classes E, F constitute a sample image subset containing 30 samples. A sample image subset can be constructed that includes the images. Thereby, the three sample image subsets contain the same number of image classes, with a sequentially decreasing number of images.

具体的には、サーバは、端末から画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを直接取得することもできるし、端末から大量のサンプル画像を取得し、サンプル画像の対応する画像種別に基づいてサンプル画像を分類処理し、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを得ることもできる。サンプル画像セットは、ロングテール分布特性に適合するサンプル画像（即ち、小さな画像クラスの画像の数が多く、大きな画像クラスの画像の数が少ない）から構成されてもよいし、正規分布特性に適合するサンプル画像から構成されてもよく、ここではサンプル画像セットのサンプル画像のクラス分布特性を限定しない。 Specifically, the server can directly obtain a sample image set containing multiple sample image subsets with decreasing number of images directly from the terminal, or it may obtain a large number of sample images from the terminal and create a corresponding sample image. The sample images may also be classified based on the image type to obtain a sample image set that includes a plurality of sample image subsets with a sequentially decreasing number of images. The sample image set may consist of sample images that conform to a long-tail distribution characteristic (i.e., a large number of images in the small image class and a small number of images in the large image class), or may consist of sample images that conform to a normal distribution characteristic. The class distribution characteristics of the sample images of the sample image set are not limited here.

このステップでは、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得することにより、サンプル画像の前処理を実現し、サンプル画像が画像クラスによって並べられて異なるサンプル画像サブセットにあり、後続のブランチニューラルネットワークによる特徴学習が容易になり、訓練の際に画像の数が少ない画像クラスでも十分に訓練されることができ、従来のニューラルネットワークの訓練におけるロングテールデータの無視を防止し、画像認識モデルの生成効果を向上させる。 In this step, sample image preprocessing is achieved by obtaining a sample image set containing multiple sample image subsets with the number of images decreasing sequentially, and the sample images are ordered by image class into different sample image subsets. Yes, it facilitates feature learning by subsequent branch neural networks, allows image classes with a small number of images to be trained well during training, and prevents neglect of long-tail data in traditional neural network training. and improve the generation effect of image recognition models.

ステップ２２：サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得、訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、損失値は、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とを含み、目標分類損失値は、訓練対象の画像認識モデルのサンプル画像セットに対する損失値であり、各ブランチニューラルネットワークに対応する分類損失値は、対応するブランチニューラルネットワークのブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値である。 Step 22: Based on the sample image set, train the image recognition model to be trained to obtain the loss value of the image recognition model to be trained; The loss value includes a target classification loss value and a classification loss value corresponding to each branch neural network, and the target classification loss value is a loss value for a sample image set of the image recognition model to be trained. , and the classification loss value corresponding to each branch neural network is the loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network.

具体的には、１×１の畳み込みによってブランチニューラルネットワークの構築を実現できるため、ブランチニューラルネットワークの構築の際に非常に少ない追加パラメータを使用するだけで済む。訓練対象の画像認識モデルに複数のブランチニューラルネットワークが構築されているため、ブランチニューラルネットワークは、訓練対象の画像認識モデルのパラメータを、サンプル画像の共通特徴を抽出するための共有パラメータと、共有パラメータに基づいて、ブランチニューラルネットワークに対応するサンプル画像サブセットのサンプル画像を抽出するための個別パラメータとの２部分に分けることができる。個別パラメータはブランチニューラルネットワークにおける対応パラメータである。 Specifically, since the construction of the branch neural network can be realized by 1×1 convolution, very few additional parameters are needed when constructing the branch neural network. Since multiple branch neural networks are constructed for the image recognition model to be trained, the branch neural network divides the parameters of the image recognition model to be trained into shared parameters for extracting common features of sample images, and shared parameters. Based on the branch neural network, the corresponding sample image subset can be divided into two parts with individual parameters for extracting the sample images. Individual parameters are corresponding parameters in the branch neural network.

ブランチニューラルネットワークが構築された後に、ブランチニューラルネットワークの個数及びサンプル画像サブセットに基づいて、ブランチニューラルネットワークとサンプル画像サブセットとの対応関係を決定することができる。通常は３つのブランチニューラルネットワーク及び３つのサンプル画像サブセットであり、その１つのブランチニューラルネットワークが３つのサンプル画像サブセットに対応し、２番目のブランチニューラルネットワークが３つのサンプル画像サブセットの２番目及び３番目のサンプル画像サブセットに対応し、３番目のブランチニューラルネットワークが３つのサンプル画像サブセットの３番目のサンプル画像サブセット（画像の数が最も少ないサンプル画像サブセット）に対応するように定められている。 After the branch neural network is constructed, the correspondence between the branch neural network and the sample image subset can be determined based on the number of branch neural networks and the sample image subset. Usually three branch neural networks and three sample image subsets, with one branch neural network corresponding to the three sample image subsets and a second branch neural network corresponding to the second and third of the three sample image subsets. A third branch neural network is defined to correspond to the third sample image subset of the three sample image subsets (the sample image subset with the least number of images).

例えば、あるサンプル画像セットには、ｈｅａｄｃｌａｓｓｅｓ（ヘッドデータ，ｈと略称する）、ｍｅｄｉｕｍｃｌａｓｓｅｓ（中間データ、ｍと略称する）及びｔａｉｌｃｌａｓｓｅｓ（テールデータ、ｔと略称する）の３つのサンプル画像サブセットが含まれ、ｈｅａｄｃｌａｓｓｅｓには画像の数が最も多い初めの１／３の画像クラスが含まれ、ｍｅｄｉｕｍｃｌａｓｓｅｓには画像の数が中間の１／３の画像クラスが含まれ、ｔａｉｌｃｌａｓｓｅｓには画像の数が最も少ない残りの１／３の画像クラスが含まれる。１×１の畳み込みによって図３に示すような３つのブランチニューラルネットワークＮ_{ｈ＋ｍ＋ｔ}、Ｎ_ｍ＋ｔ及びＮ_ｔを構築し、ここで、ブランチニューラルネットワークＮ_{ｈ＋ｍ＋ｔ}は全てのサンプル画像サブセットに対応し、全てのサンプル画像サブセットにおける画像クラスを分類するためのものであり、ブランチニューラルネットワークＮ_ｍ＋ｔは２つのサンプル画像サブセットに対応し、画像の数が相対的に少ないｍｅｄｉｕｍｃｌａｓｓｅｓ及びｔａｉｌｃｌａｓｓｅｓサンプル画像サブセットにおける画像クラスを分類するためのものであり、ブランチニューラルネットワークＮ_ｔは１つのサンプル画像サブセットに対応し、画像の数が最も少ないｔａｉｌｃｌａｓｓｅｓサンプル画像サブセットにおける画像クラスを分類するためのものである。これにより、３つのブランチニューラルネットワークＮ_{ｈ＋ｍ＋ｔ}、Ｎ_ｍ＋ｔ及びＮ_ｔはいずれもそれ自体の個別パラメータによって対応するサンプル画像サブセットにおける画像クラスの学習を導くことができ、画像の数が少ないｔａｉｌｃｌａｓｓｅｓは３つのブランチニューラルネットワークと対応関係が存在し、数が多いｈｅａｄｃｌａｓｓｅｓは１つのブランチニューラルネットワークのみと対応関係が存在するため、ロングテールデータの利用度がある程度実現され、画像の数の異なる画像クラスが訓練時にバランスをとるようになる。 For example, a sample image set includes three sample image subsets: head classes (head data, abbreviated as h), medium classes (m, abbreviated as m), and tail classes (tail data, abbreviated as t ). The head classes include the first 1/3 of the image classes with the largest number of images, the medium classes include the middle 1/3 of the image classes with the largest number of images, and the tail classes include The remaining 1/3 image classes with the least number of images are included. We construct three branch neural networks N _h+m+t , N _m+t and N _t as shown in Fig. 3 by 1×1 convolution, where the branch neural network N _h+m+t corresponds to all sample image subsets and all sample The branch neural network N _m+t corresponds to two sample image subsets, medium classes and tail classes, which have a relatively small number of images. The branch neural network N _t corresponds to one sample image subset and is for classifying image classes in the sample image subset with the least number of images. This allows the three branch neural networks N _h+m+t , N _m+t and N _t to all guide the learning of image classes in the corresponding sample image subsets by their own individual parameters, and the tail classes with a small number of images are Since there is a correspondence relationship with two branch neural networks, and a large number of head classes exists a correspondence relationship with only one branch neural network, the degree of utilization of long tail data is realized to a certain extent, and image classes with different numbers of images can be used. Learn to balance during training.

訓練対象の画像認識モデルの損失値は、分類損失値及び目標分類損失値を含み、分類損失値は、ブランチニューラルネットワークのブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であり、目標分類損失値は、訓練対象の画像認識モデルのサンプル画像セットに対する損失値であり、調整可能である。複数の分類損失値及び目標分類損失値に基づいて、訓練対象の画像認識モデルを訓練する損失値を得、画像認識モデル全体の訓練度合を判断することができる。 The loss value of the image recognition model to be trained includes a classification loss value and a target classification loss value, and the classification loss value is a loss value for a sample image subset corresponding to the branch neural network of the branch neural network, and the target classification loss value is the loss value for the sample image set of the image recognition model to be trained, and is adjustable. Based on the plurality of classification loss values and the target classification loss value, a loss value for training the image recognition model to be trained can be obtained, and the degree of training of the entire image recognition model can be determined.

分類損失値は、ブランチニューラルネットワークにそれぞれ対応するサンプル画像サブセットの損失値であり、即ち、ブランチニューラルネットワークＮ_{ｈ＋ｍ＋ｔ}に対応するｈｅａｄｃｌａｓｓｅｓ、ｍｅｄｉｕｍｃｌａｓｓｅｓ及びｔａｉｌｃｌａｓｓｅｓサンプル画像サブセットの損失値であるか、又はブランチニューラルネットワークＮ_ｔに対応するｔａｉｌｃｌａｓｓｅｓサンプル画像サブセットの損失値である。目標分類損失値は、訓練対象の画像認識モデル全体から出力された画像クラスがサンプル画像セット全体に対応して得られた損失値であり、即ち、複数のブランチニューラルネットワークがサンプル画像セットを認識して出力された画像クラスを融合して得られた画像クラスに対応するサンプル画像セットの損失値である。各ブランチニューラルネットワークに対応する分類損失値と目標分類損失値との相違は、損失値を計算する際に考量する対象が異なることであり、分類損失値は各ブランチニューラルネットワークから出力された画像クラスを対応するサンプル画像サブセットのサンプル画像の実際の画像クラスと比較して得られた損失値であるが、目標分類損失値は訓練対象の画像認識モデル全体から出力されたサンプル画像の認識された画像クラス（即ち複数のブランチニューラルネットワークから出力された画像クラスの融合結果）をサンプル画像セットのサンプル画像の実際の画像クラスと比較して得られた損失値である。 The classification loss values are the loss values of the sample image subsets respectively corresponding to the branch neural network, i.e. the head classes, medium classes and tail classes sample image subsets corresponding to the branch neural network N _h+m+t , or is the loss value of the tail classes sample image subset corresponding to the branch neural network N _t . The target classification loss value is the loss value obtained when the image class output from the entire image recognition model to be trained corresponds to the entire sample image set, that is, when multiple branch neural networks recognize the sample image set. This is the loss value of the sample image set corresponding to the image class obtained by fusing the image classes output. The difference between the classification loss value corresponding to each branch neural network and the target classification loss value is that the targets to be considered when calculating the loss value are different, and the classification loss value is calculated based on the image class output from each branch neural network. is the loss value obtained by comparing the actual image class of the sample image with the corresponding sample image subset, while the target classification loss value is the recognized image class of the sample image output from the entire image recognition model to be trained. The loss value obtained by comparing the class (ie, the result of fusion of image classes output from multiple branch neural networks) with the actual image class of the sample images of the sample image set.

このステップでは、ブランチニューラルネットワークによって対応する画像を認識して、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とを得、画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得ることにより、訓練の際に画像の数が少ない画像クラスでも十分に訓練されることができ、従来のニューラルネットワークの訓練におけるロングテールデータの無視を防止し、画像認識モデルの生成効果を向上させる。 In this step, the corresponding images are recognized by the branch neural network to obtain the target classification loss value and the classification loss value corresponding to each branch neural network, and the image recognition model is trained to create the image recognition model to be trained. By obtaining a loss value of , even image classes with a small number of images during training can be sufficiently trained, preventing the neglect of long-tail data in traditional neural network training, and improving the generation of image recognition models. Improve effectiveness.

ステップ２３：損失値に基づいて、損失値が予め設定された閾値よりも低くなるまで訓練対象の画像認識モデルのモデルパラメータを調整し、訓練対象の画像認識モデルを訓練済みの画像認識モデルとする。 Step 23: Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value is lower than a preset threshold, and make the image recognition model to be trained the trained image recognition model. .

具体的には、サーバは、計算された損失値に基づいて、訓練対象の画像認識モデルにおける畳み込み層、プーリング層、正規化層等を含むがこれらに限定されない重みやバイアスなどの各パラメータを逆調整し、通常の場合に、複数回の訓練を繰り返した後、各損失値が徐々に小さくなって一定値に近づく。予め設定された閾値はこの一定値の付近に設定することができ、損失値が予め設定された閾値よりも低い場合に、画像認識モデルの訓練が終わると判断することができる。 Specifically, the server inverts each parameter such as weights and biases of the convolution layer, pooling layer, normalization layer, etc. in the image recognition model to be trained based on the calculated loss value. After tuning and, in the normal case, repeating the training several times, each loss value gradually becomes smaller and approaches a constant value. A preset threshold can be set around this constant value, and it can be determined that the training of the image recognition model is finished when the loss value is lower than the preset threshold.

このステップでは、損失値によって画像認識モデルのパラメータを絶えずに調整し、損失値と予め設定された閾値との差に基づいて画像認識モデルの訓練度合を判断し、画像認識モデルの算出された損失値が予め設定された閾値よりも低くなると、画像認識モデルの訓練が終わると判断することができ、画像認識モデルの生成効果を向上させる。 In this step, the parameters of the image recognition model are constantly adjusted according to the loss value, the training degree of the image recognition model is determined based on the difference between the loss value and the preset threshold, and the calculated loss of the image recognition model is When the value becomes lower than a preset threshold, it can be determined that the training of the image recognition model is finished, and the generation effect of the image recognition model is improved.

上記の画像認識モデル生成方法は、同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するステップと、サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得るステップであって、訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、損失値は、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とを含み、目標分類損失値は、訓練対象の画像認識モデルのサンプル画像セットに対する損失値であり、分類損失値は、対応するブランチニューラルネットワークのブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値であるステップと、損失値に基づいて、損失値が予め設定された閾値よりも低くなるまで訓練対象の画像認識モデルのモデルパラメータを調整し、訓練対象の画像認識モデルを訓練済みの画像認識モデルとするステップと、を含む。本願は、画像の数が順次減少する複数のサンプル画像サブセットと、対応するサンプル画像サブセットの画像を認識するブランチニューラルネットワークとを設けることにより、訓練の際に画像の数が少ない画像クラスでも十分に訓練されることができ、従来のニューラルネットワークにおけるロングテールデータの無視を防止し、画像認識モデルの生成効果を向上させる。 The above image recognition model generation method includes the steps of obtaining a sample image set including a plurality of sample image subsets, each containing the same number of image classes and with the number of images decreasing sequentially; training an image recognition model to obtain a loss value of the image recognition model to be trained, the image recognition model to be trained includes a plurality of branch neural networks for respectively recognizing corresponding images; The values include a target classification loss value and a classification loss value corresponding to each branch neural network, the target classification loss value is a loss value for the sample image set of the image recognition model to be trained, and the classification loss value is: The step is the loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network, and based on the loss value, the model of the image recognition model to be trained until the loss value is lower than a preset threshold. The method includes a step of adjusting parameters to make the image recognition model to be trained a trained image recognition model. In this application, by providing a plurality of sample image subsets in which the number of images decreases sequentially and a branch neural network that recognizes images of the corresponding sample image subsets, image classes with a small number of images can be sufficiently trained during training. It can be trained to prevent the neglect of long tail data in traditional neural networks and improve the generation effect of image recognition models.

一実施例において、図４に示すように、サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得る上記ステップ２２は、
サンプル画像セットの複数のサンプル画像サブセットを均一にサンプリングして、サンプル画像入力シーケンスを得るステップ４１と、
サンプル画像入力シーケンスに基づいて、サンプル画像を訓練対象の画像認識モデルに入力して、サンプル画像の認識された画像クラスを得るステップ４２と、
サンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとに基づいて、訓練対象の画像認識モデルの損失値を決定するステップ４３と、を含む。 In one embodiment, as shown in FIG. 4, the step 22 of training the image recognition model to be trained based on the sample image set to obtain the loss value of the image recognition model to be trained includes:
uniformly sampling a plurality of sample image subsets of the sample image set to obtain a sample image input sequence;
inputting the sample image into a trained image recognition model based on the sample image input sequence to obtain a recognized image class of the sample image;
determining a loss value of the image recognition model to be trained based on the recognized image class of the sample image and the actual image class of the corresponding sample image .

具体的には、サーバは、サンプル画像セットの複数のサンプル画像サブセットを均一にサンプリングして、ｍｉｎｉ－ｂａｔｃｈデータを得、ｍｉｎｉ－ｂａｔｃｈデータをサンプル画像入力シーケンスとして訓練対象の画像認識モデルに入力して訓練し、画像認識モデルから出力されたサンプル画像の認識された画像クラスを得、サンプル画像の実際の画像クラスを取得し、サンプル画像の認識された画像クラス及び実際の画像クラスを予め設定された損失関数に入力し、画像認識モデルの損失値を計算して得る。 Specifically, the server uniformly samples multiple sample image subsets of the sample image set to obtain mini-batch data, and inputs the mini-batch data as a sample image input sequence to an image recognition model to be trained. training, obtain the recognized image class of the sample image output from the image recognition model, obtain the actual image class of the sample image, and set the recognized image class and actual image class of the sample image in advance. The loss value of the image recognition model is calculated and obtained.

本実施例は、均一にサンプリングすることにより、サンプル画像入力シーケンスにおける各画像クラスのサンプル画像のバランスが取れるようになり、さらに、決定された訓練対象の画像認識モデルの損失値がより正確になり、画像認識モデルの生成効果を向上させる。 In this embodiment, by uniformly sampling, the sample images of each image class in the sample image input sequence can be balanced, and the determined loss value of the image recognition model to be trained can be more accurate. , improve the generation effect of image recognition models.

一実施例において、訓練対象の画像認識モデルは、ブランチニューラルネットワークに接続されるベースニューラルネットワークをさらに含み、
サンプル画像入力シーケンスに基づいて、サンプル画像を訓練対象の画像認識モデルに入力して、サンプル画像の認識された画像クラスを得る上記ステップ４２は、ベースニューラルネットワークがサンプル画像の第１画像特徴を取得し、ブランチニューラルネットワークが、第１画像特徴に基づいてサンプル画像の第２画像特徴を得るとともに、第２画像特徴に基づいてサンプル画像セットのサンプル画像の認識された画像クラスを決定するように、サンプル画像を訓練対象の画像認識モデルに入力することを含む。 In one embodiment, the image recognition model to be trained further includes a base neural network connected to the branch neural network;
The above step 42 includes inputting the sample image into the image recognition model to be trained to obtain a recognized image class of the sample image based on the sample image input sequence, wherein the base neural network acquires the first image feature of the sample image. and the branch neural network obtains a second image feature of the sample image based on the first image feature and determines a recognized image class of the sample image of the sample image set based on the second image feature. It involves inputting sample images into an image recognition model to be trained.

具体的には、ベースニューラルネットワークは、サンプル画像セットのサンプル画像の特徴情報を抽出し、即ちサンプル画像セットの全ての画像クラスの共通特徴を第１画像特徴として抽出し、ブランチニューラルネットワークは、ベースニューラルネットワークにより抽出された第１画像特徴を取得し、再抽出して第２画像特徴を得て出力する。ブランチニューラルネットワークから出力された第２画像特徴を分類器を通じて融合し、サンプル画像の画像クラスを得る。ベースニューラルネットワークのパラメータは、各ブランチニューラルネットワークに使用されることができる共有パラメータである。ここではベースニューラルネットワークのタイプ及び構造を限定しない。 Specifically, the base neural network extracts the feature information of the sample images of the sample image set, that is, extracts the common feature of all image classes of the sample image set as the first image feature, and the branch neural network extracts the feature information of the sample images of the sample image set as the first image feature. The first image features extracted by the neural network are acquired and re-extracted to obtain and output the second image features. The second image features output from the branch neural network are fused through a classifier to obtain an image class of the sample image. The parameters of the base neural network are shared parameters that can be used for each branch neural network. The type and structure of the base neural network is not limited here.

一実施例において、図５に示すように、サンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとに基づいて、訓練対象の画像認識モデルの損失値を決定する上記ステップ４３は、
サンプル画像セットのサンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとに基づいて、サンプル画像セットのサンプル画像の損失値を決定するステップ５１と、
複数のブランチニューラルネットワークにより決定されたサンプル画像セットのサンプル画像の損失値に基づいて、サンプル画像セットに対応する損失値を得、目標分類損失値とするステップ５２と、
各ブランチニューラルネットワークに対応するサンプル画像サブセットの全てのサンプル画像の損失値を取得し、サンプル画像サブセットの全てのサンプル画像の損失値の和を各ブランチニューラルネットワークに対応する分類損失値とするステップ５３と、
目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とに基づいて、訓練対象の画像認識モデルの損失値を計算して得るステップ５４と、を含む。 In one embodiment, the above step of determining the loss value of the image recognition model to be trained based on the recognized image class of the sample image and the actual image class of the corresponding sample image , as shown in FIG. 43 is
determining 51 a loss value for a sample image of the sample image set based on the recognized image class of the sample image of the sample image set and the actual image class of the corresponding sample image;
obtaining a loss value corresponding to the sample image set based on the loss value of the sample image of the sample image set determined by the plurality of branch neural networks, and setting it as a target classification loss value;
Step 53: obtaining the loss values of all sample images of the sample image subset corresponding to each branch neural network, and setting the sum of the loss values of all the sample images of the sample image subset as the classification loss value corresponding to each branch neural network. and,
The method includes a step 54 of calculating and obtaining a loss value of the image recognition model to be trained based on the target classification loss value and the classification loss value corresponding to each branch neural network.

具体的には、サンプル画像セットがｈｅａｄｃｌａｓｓｅｓ、ｍｅｄｉｕｍｃｌａｓｓｅｓ、ｔａｉｌｃｌａｓｓｅｓの３つのサンプル画像サブセットを含むことを例に説明する。目標分類損失値は、訓練対象の画像認識モデル全体から出力されたサンプル画像の認識された画像クラス（即ち複数のブランチニューラルネットワークから出力された画像クラスの融合結果）をサンプル画像セットのサンプル画像の実際の画像クラスと比較して得られた損失値であり、そのため、目標分類損失値は、３つのブランチニューラルネットワークから出力された、サンプル画像セットのサンプル画像に対応する認識された画像クラスを計算し、全ての認識された画像クラスと実際の画像クラスとを損失関数に入力して、得られた損失値が目標分類損失値であり、下記式：
Ｌ _ｆ =Ｊ(Ｆ _net (Ｘ),Ｙ)，ここで、Ｆ _net (Ｘ)=Ｎ _h+m+t (Ｘ)+Ｎ _m+t (Ｘ)+Ｎ _t (Ｘ)
（式中、Ｌ_ｆは目標分類損失値であり、Ｊはクロスエントロピー損失関数であり、Ｆ_ｎｅｔは訓練対象の画像認識モデルであり、Ｘはサンプル画像入力シーケンスにおけるサンプル画像セットであり、Ｙはサンプル画像の実際の画像クラスであり、ｈ、ｍ、ｔはそれぞれ画像の数が順次減少する第１、第２及び第３サンプル画像サブセットであり、Ｎ _{ｈ＋ｍ＋ｔ} 、Ｎ _ｍ＋ｔ、Ｎ _ｔは３つのサンプル画像サブセットに対応する３つのブランチニューラルネットワークであり、添え字はブランチニューラルネットワークに対応するサンプル画像サブセットである。）で示される。 Specifically, an example will be explained in which the sample image set includes three sample image subsets: head classes, medium classes, and tail classes. The target classification loss value is calculated by dividing the recognized image class of the sample image output from the entire image recognition model to be trained (i.e., the fusion result of the image classes output from multiple branch neural networks) of the sample image of the sample image set. The loss value obtained by comparing with the actual image class, so the target classification loss value calculates the recognized image class corresponding to the sample image of the sample image set output from the three branch neural network. Then, all recognized image classes and actual image classes are input into the loss function, and the obtained loss value is the target classification loss value, which is calculated by the following formula:
L _f =J(F _net (X),Y), where F _net (X)=N _h+m+t (X)+N _m+t (X)+N _t (X)
(where L _f is the target classification loss value, J is the cross-entropy loss function, F _net is the image recognition model to be trained, X is the sample image set in the sample image input sequence, and Y is the are the actual image classes of the sample images, h, m, t are the first, second and third sample image subsets with sequentially decreasing number of images, respectively, and N _h+m+t , N _m+t , N _t are the three samples 3 branch neural networks corresponding to the image subsets, and the subscript is the sample image subset corresponding to the branch neural network.

分類損失値は、各ブランチニューラルネットワークが対応するサンプル画像サブセットに対して得られた損失値であり、サンプル画像セット全体に対するものではない。例えば、ブランチニューラルネットワークＮ _{ｈ＋ｍ＋ｔ}と第１、第２及び第３サンプル画像サブセットとが対応関係にあり、ブランチニューラルネットワークＮ _{ｈ＋ｍ＋ｔ}を計算する場合に、サンプル画像セット全体に対する損失値を計算することに相当する。ブランチニューラルネットワークＮ _ｔは第３サンプル画像サブセットのみと対応関係が存在するため、ブランチニューラルネットワークＮ _ｔの損失値を計算する場合に、第３サンプル画像サブセットの対応するサンプル画像の実際の画像クラスに基づいて損失値を計算すればよい。全てのブランチニューラルネットワークにより算出された各ブランチニューラルネットワークに対応する分類損失値を得て加算演算を行った結果は、最終的な画像クラスの予測結果であり、具体的に、下記式：
（式中、Ｌ_ｉは複数のブランチニューラルネットワークに対応する分類損失値の和であり、Ｓ_ｍ＋ｔはＸの一方のサブセットであり、サンプル画像入力シーケンスにおける第２及び第３サンプル画像サブセットに属するサンプル画像を含み、Ｓ_ｔはＸの他方のサブセットであり、サンプル画像入力シーケンスにおける第３サンプル画像サブセットに属するサンプル画像を含む。）で示される。 The classification loss value is the loss value obtained for each branch neural network's corresponding subset of sample images, and not for the entire set of sample images. For example, when the branch neural network N _h+m+t has a correspondence relationship with the first, second, and third sample image subsets, and the branch neural network N _h+m+t is calculated, this corresponds to calculating the loss value for the entire sample image set. do. Since the branch neural network N _t has a correspondence only with the third sample image subset, when calculating the loss value of the branch neural network N _t , the actual image class of the corresponding sample image of the third sample image subset The loss value can be calculated based on this. The result of obtaining the classification loss values corresponding to each branch neural network calculated by all the branch neural networks and performing the addition operation is the final image class prediction result, and specifically, the following formula:
(where L _i is the sum of classification loss values corresponding to multiple branch neural networks, S _m+t is one subset of X, and samples belonging to the second and third sample image subsets in the sample image input sequence S _t is the other subset of X and includes sample images belonging to the third sample image subset in the sample image input sequence.

訓練対象の画像認識モデルの損失値は、各ブランチニューラルネットワークに対応する分類損失値及び目標分類損失値の両方により計算して得られ、具体的に、下記式：
Ｌ_ａｌｌ＝（１－α）Ｌ_ｆ／ｎ_１＋αＬ_ｉ／ｎ_２；
（式中、Ｌ_ａｌｌは訓練対象の画像認識モデルの損失値であり、αはハイパーパラメータであり、ｎ_１はＸにおけるサンプル画像の数であり、ｎ_２はＸ、Ｓ_ｍ＋ｔ及びＳ_ｔにおけるサンプル画像の数の総和である。）で示される。 The loss value of the image recognition model to be trained is obtained by calculating both the classification loss value corresponding to each branch neural network and the target classification loss value, and is specifically calculated using the following formula:
L _all =(1-α)L _f /n ₁ +αL _i /n ₂ ;
(where L _all is the loss value of the image recognition model to be trained, α is the hyperparameter, n ₁ is the number of sample images in X, n ₂ is the sample image in X, S _{m + t} and S _t ) is the total number of images.

なお、ロングテールの程度の異なるデータセットに対して、Ｌ_ａｌｌ関数におけるハイパーパラメータαによって調整可能である。また、データセットが正常分布状態（即ち各画像クラスの画像の数が均一である）である場合に、ハイパーパラメータαを０にすれば正常に動作することができる。 Note that it is possible to adjust the hyperparameter α in the L _all function for data sets with different degrees of long tail. Further, when the data set has a normal distribution (that is, the number of images in each image class is uniform), normal operation can be achieved by setting the hyperparameter α to 0.

上記実施例は、サンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとの相違によって、目標分類損失値及び各ブランチニューラルネットワークに対応する分類損失値を算出し、さらに訓練対象の画像認識モデルの損失値を得ることで、訓練対象の画像認識モデルにおけるパラメータを調整することができるため、訓練の際に画像の数が少ない画像クラスでも十分に訓練されることができ、従来のニューラルネットワークの訓練におけるロングテールデータの無視を防止し、画像認識モデルの生成効果を向上させる。 The above embodiment calculates the target classification loss value and the classification loss value corresponding to each branch neural network according to the difference between the recognized image class of the sample image and the actual image class of the corresponding sample image , and further performs training. By obtaining the loss value of the target image recognition model, the parameters of the target image recognition model can be adjusted, so even image classes with a small number of images can be sufficiently trained during training. It prevents the neglect of long tail data in conventional neural network training and improves the generation effectiveness of image recognition models.

一実施例において、図６に示すように、サンプル画像セットを取得する上記ステップ２１の前に、
サンプル画像を取得し、サンプル画像の画像クラスに基づいて、画像クラスの画像の数を決定するステップ６１と、
画像クラスの画像の数に基づいて、画像クラスの並び順を得、並び順に従って、画像クラスを、同じ数の画像クラスを含む複数のクラス組み合わせに分けるステップ６２と、
複数のクラス組み合わせと、複数のクラス組み合わせにおける画像クラスに対応するサンプル画像とに基づいて、複数のクラス組み合わせに対応するサンプル画像サブセットを得、複数のサンプル画像サブセットの組み合わせをサンプル画像セットとするステップ６３と、をさらに含む。 In one embodiment, as shown in FIG. 6, before the above step 21 of acquiring the sample image set,
obtaining a sample image and determining the number of images of the image class based on the image class of the sample image;
obtaining a sorting order of the image classes based on the number of images in the image class, and dividing the image classes into a plurality of class combinations containing the same number of image classes according to the sorting order;
Obtaining sample image subsets corresponding to the plurality of class combinations based on the plurality of class combinations and sample images corresponding to image classes in the plurality of class combinations, and making the combination of the plurality of sample image subsets a sample image set. 63.

具体的に、サーバは端末からサンプル画像を取得し、サンプル画像の画像クラスを認識し、画像クラスに従ってサンプル画像を分類するとともに、各画像クラスに対応するサンプル画像の数を統計する。画像クラスに対応するサンプル画像の数に基づいて、画像クラスを高い順に順次並べて、並び順を得る。ブランチニューラルネットワークの数及び画像クラスの数に基づいて、画像クラスを複数のクラス組み合わせに均一に分配する。例えば、３つのブランチニューラルネットワーク、６つの画像クラスであると、２つの画像クラスを一組にし、３つのクラス組み合わせを得る。クラス組み合わせと、クラス組み合わせに対応するサンプル画像とに基づいて、クラス組み合わせに対応するサンプル画像サブセットを得、複数のサンプル画像サブセットによりサンプル画像セットが構成される。 Specifically, the server acquires a sample image from the terminal, recognizes the image class of the sample image, classifies the sample image according to the image class, and statistics the number of sample images corresponding to each image class. Based on the number of sample images corresponding to the image class, the image classes are arranged in ascending order to obtain a sorting order. Image classes are uniformly distributed among a plurality of class combinations based on the number of branch neural networks and the number of image classes. For example, if there are three branch neural networks and six image classes, two image classes are combined into a set to obtain three class combinations. A sample image subset corresponding to the class combination is obtained based on the class combination and a sample image corresponding to the class combination, and a sample image set is configured by the plurality of sample image subsets.

本実施例は、画像クラスの画像の数に基づいて、高い順又は低い順に並べて、並び順に基づいて均一に分配し、同じ数の画像クラスを含むサンプル画像サブセットを得ることで、サンプル画像の前処理を実現し、各ブランチニューラルネットワークがロングテールデータ分布の特性に従ってサンプル画像サブセットと互いに対応するため、訓練の際に画像の数が少ない画像クラスでも十分に訓練されることができ、従来のニューラルネットワークの訓練におけるロングテールデータの無視を防止し、画像認識モデルの生成効果を向上させる。 This example arranges the images in the highest or lowest order based on the number of images in the image class, distributes them uniformly based on the order, and obtains sample image subsets containing the same number of image classes. Because each branch neural network corresponds to each other with sample image subsets according to the characteristics of long-tail data distribution, even image classes with a small number of images can be trained well during training, which is different from traditional neural networks. Prevent the neglect of long tail data during network training and improve the generation effectiveness of image recognition models.

図２、図４～図６のフローチャートにおける各ステップは、矢印に示されるように順次表示されるが、これらのステップは必ずしも矢印に示される順序によって順次実行されるわけではないことを理解されたい。本明細書において明確に説明しない限り、これらのステップの実行は順序に限定されるものではなく、他の順序で実行されてもよい。そして、図２、図４～図６における少なくとも一部のステップは、複数のステップ又は複数の段階を含むことができ、これらのステップ又は段階は必ずしも同じ時刻に実行されるわけではなく、異なる時刻に実行されてもよく、これらのステップ又は段階の実行順序も必ずしも順次実行されるわけではなく、他のステップ又は他のステップにおけるステップ又は段階の少なくとも一部と順番又は交互に実行されてもよい。 Although the steps in the flowcharts of FIGS. 2 and 4-6 are shown sequentially as indicated by the arrows, it is to be understood that these steps are not necessarily performed sequentially in the order indicated by the arrows. . Unless explicitly stated herein, the performance of these steps is not limited to any order and may be performed in other orders. And, at least some of the steps in FIGS. 2 and 4 to 6 may include multiple steps or multiple stages, and these steps or stages are not necessarily performed at the same time, but at different times. The order of execution of these steps or stages is also not necessarily performed sequentially, but may be performed sequentially or alternately with other steps or at least some of the steps or stages in other steps. .

一実施例において、図７に示すように、
同じ数の画像クラスをそれぞれ含み、画像の数が順次減少する複数のサンプル画像サブセットを含むサンプル画像セットを取得するための取得モジュール７１と、
サンプル画像セットに基づいて、訓練対象の画像認識モデルを訓練して、訓練対象の画像認識モデルの損失値を得るための訓練モジュール７２であって、訓練対象の画像認識モデルは、対応する画像をそれぞれ認識するためのブランチニューラルネットワークを複数含み、損失値は、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とを含み、目標分類損失値は、訓練対象の画像認識モデルのサンプル画像セットに対する損失値であり、各前記ブランチニューラルネットワークに対応する分類損失値は、対応するブランチニューラルネットワークのブランチニューラルネットワークに対応するサンプル画像サブセットに対する損失値である訓練モジュール７２と、
損失値に基づいて、損失値が予め設定された閾値よりも低くなるまで訓練対象の画像認識モデルのモデルパラメータを調整し、訓練対象の画像認識モデルを訓練済みの画像認識モデルとするための調整モジュール７３と、を備える画像認識モデル生成装置を提供する。 In one embodiment, as shown in FIG.
an acquisition module 71 for acquiring a sample image set comprising a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
a training module 72 for training an image recognition model to be trained based on a set of sample images to obtain a loss value for the image recognition model to be trained, wherein the image recognition model to be trained is configured to train an image recognition model to be trained based on a set of sample images; Each includes a plurality of branch neural networks for recognition, the loss value includes a target classification loss value and a classification loss value corresponding to each branch neural network, and the target classification loss value is a sample of the image recognition model to be trained. a training module 72, wherein the classification loss value is a loss value for a set of images, and the classification loss value corresponding to each said branch neural network is a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value is lower than a preset threshold, and adjust the image recognition model to be trained to become the trained image recognition model. An image recognition model generation device is provided that includes a module 73.

一実施例において、訓練モジュール７２はさらに、サンプル画像セットの複数のサンプル画像サブセットを均一にサンプリングして、サンプル画像入力シーケンスを得、サンプル画像入力シーケンスに基づいて、サンプル画像を訓練対象の画像認識モデルに入力して、サンプル画像の認識された画像クラスを得、サンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとに基づいて、訓練対象の画像認識モデルの損失値を決定する。 In one embodiment, training module 72 further uniformly samples the plurality of sample image subsets of the sample image set to obtain a sample image input sequence, and based on the sample image input sequence, training module 72 further uniformly samples the plurality of sample image subsets of the sample image set to obtain a sample image input sequence. input the model to obtain the recognized image class of the sample image, and then calculate the loss value of the image recognition model to be trained based on the recognized image class of the sample image and the corresponding actual image class of the sample image. Determine.

一実施例において、訓練モジュール７２はさらに、ベースニューラルネットワークがサンプル画像の第１画像特徴を取得し、ブランチニューラルネットワークが、第１画像特徴に基づいてサンプル画像の第２画像特徴を得るとともに、第２画像特徴に基づいてサンプル画像セットのサンプル画像の認識された画像クラスを決定するように、サンプル画像を訓練対象の画像認識モデルに入力する。 In one embodiment, training module 72 further comprises: the base neural network obtains a first image feature of the sample image, the branch neural network obtains a second image feature of the sample image based on the first image feature, and the branch neural network obtains a second image feature of the sample image based on the first image feature; The sample images are input to a trained image recognition model to determine a recognized image class of a sample image of the sample image set based on the two image features.

一実施例において、訓練モジュール７２はさらに、サンプル画像セットのサンプル画像の認識された画像クラスと、対応するサンプル画像の実際の画像クラスとに基づいて、サンプル画像セットのサンプル画像の損失値を決定し、複数のブランチニューラルネットワークにより決定されたサンプル画像セットのサンプル画像の損失値に基づいて、サンプル画像セットに対応する損失値を得、目標分類損失値とし、各ブランチニューラルネットワークに対応するサンプル画像サブセットの全てのサンプル画像の損失値を取得し、サンプル画像サブセットの全てのサンプル画像の損失値の和を各ブランチニューラルネットワークに対応する分類損失値とし、目標分類損失値と、各ブランチニューラルネットワークに対応する分類損失値とに基づいて、訓練対象の画像認識モデルの損失値を計算して得る。 In one embodiment, training module 72 further determines a loss value for the sample image of the sample image set based on the recognized image class of the sample image of the sample image set and the actual image class of the corresponding sample image. Then, based on the loss value of the sample image of the sample image set determined by multiple branch neural networks, the loss value corresponding to the sample image set is obtained, and the target classification loss value is set as the target classification loss value, and the sample image corresponding to each branch neural network is obtained. Obtain the loss values of all sample images in the subset, set the sum of the loss values of all sample images in the sample image subset as the classification loss value corresponding to each branch neural network, and set the target classification loss value and each branch neural network as the sum of the loss values of all sample images in the sample image subset. The loss value of the image recognition model to be trained is calculated and obtained based on the corresponding classification loss value.

一実施例において、取得モジュール７１はさらに、サンプル画像を取得し、サンプル画像の画像クラスに基づいて、画像クラスの画像の数を決定し、画像クラスの画像の数に基づいて、画像クラスの並び順を得、並び順に従って、画像クラスを、同じ数の画像クラスを含む複数のクラス組み合わせに分け、複数のクラス組み合わせと、複数のクラス組み合わせにおける画像クラスに対応するサンプル画像とに基づいて、複数のクラス組み合わせに対応するサンプル画像サブセットを得、複数のサンプル画像サブセットの組み合わせをサンプル画像セットとする。 In one embodiment, the acquisition module 71 further acquires the sample image, determines the number of images in the image class based on the image class of the sample image, and determines the order of the image class based on the number of images in the image class. The image classes are divided into multiple class combinations containing the same number of image classes according to the sort order, and multiple image classes are divided based on the multiple class combinations and the sample images corresponding to the image classes in the multiple class combinations. A sample image subset corresponding to the class combination is obtained, and a combination of the plurality of sample image subsets is defined as a sample image set.

画像認識モデル生成装置の具体的な限定については、上記の画像認識モデル生成方法の限定を参照することができ、ここでは詳しい説明を省略する。上記の画像認識モデル生成装置における各モジュールの全部又は一部は、ソフトウェア、ハードウェア及びこれらの組み合わせによって実現されてもよい。上記各モジュールは、プロセッサが上記各モジュールに対応する動作を呼び出して実行できるように、コンピュータ機器におけるプロセッサにハードウェアの形で埋め込まれていてもよいし、プロセッサから独立していてもよいし、コンピュータ機器におけるメモリにソフトウェアの形で記憶されていてもよい。 For specific limitations of the image recognition model generation device, reference can be made to the limitations of the image recognition model generation method described above, and a detailed explanation will be omitted here. All or part of each module in the image recognition model generation device described above may be realized by software, hardware, or a combination thereof. Each of the above modules may be embedded in the processor of the computer device in the form of hardware, or may be independent from the processor, so that the processor can call and execute the operation corresponding to each of the above modules, It may also be stored in a memory in a computer device in the form of software.

一実施例において、コンピュータ機器が提供されており、このコンピュータ機器は、サーバであってもよく、その内部構造図が図８に示されるものであってもよい。このコンピュータ機器は、システムバスを介して接続されるプロセッサ、メモリ及びネットワークインタフェースを備える。このコンピュータ機器のプロセッサは、計算機能及び制御機能を提供するためのものである。このコンピュータ機器のメモリは、不揮発性記憶媒体、内部メモリを備え、この不揮発性記憶媒体にオペレーティングシステム、コンピュータプログラム及びデータベースが記憶されており、この内部メモリは、不揮発性記憶媒体におけるオペレーティングシステム及びコンピュータプログラムの実行のための環境を提供する。このコンピュータ機器のデータベースは、画像認識モデル生成データを記憶するためのものである。このコンピュータ機器のネットワークインタフェースは、外部の端末とネットワークを介して接続通信するためのものである。このコンピュータプログラムは、プロセッサにより実行される場合に、画像認識モデル生成方法を実現することができる。 In one embodiment, a computer device is provided, which may be a server and whose internal structure diagram is shown in FIG. The computer equipment includes a processor, memory, and a network interface connected via a system bus. The processor of this computer equipment is for providing computing and control functions. The memory of the computer equipment includes a non-volatile storage medium, an internal memory, in which an operating system, a computer program and a database are stored; Provides an environment for program execution. The database of this computer equipment is for storing image recognition model generation data. The network interface of this computer device is for connecting and communicating with an external terminal via a network. This computer program can implement an image recognition model generation method when executed by a processor.

当業者であれば、図８に示される構成は、本願の技術的手段に関連する構成の一部のブロック図に過ぎず、本願の技術的手段が適用されるコンピュータ機器を限定するものではなく、具体的なコンピュータ機器は、図示よりも多いか又は少ない構成要素を含んでいてもよいし、一部の構成要素を組み合わせていてもよいし、異なる構成要素配置を有していてもよいことを理解されたい。 Those skilled in the art will understand that the configuration shown in FIG. 8 is only a block diagram of a part of the configuration related to the technical means of the present application, and does not limit the computer equipment to which the technical means of the present application is applied. , a specific computer device may include more or fewer components than illustrated, may have some combinations of components, or may have a different arrangement of components; I want you to understand.

一実施例において、コンピュータプログラムが記憶されているメモリと、コンピュータプログラムを実行する場合に、上記各方法の実施例におけるステップを実現するプロセッサと、を備えるコンピュータ機器を提供する。 In one embodiment, a computer device is provided that includes a memory in which a computer program is stored and a processor that, when executing the computer program, implements the steps in each of the method embodiments described above.

一実施例において、プロセッサによって実行される場合に、上記各方法の実施例におけるステップを実現するコンピュータプログラムが記憶されているコンピュータ可読記憶媒体を提供する。 In one embodiment, a computer-readable storage medium is provided having stored thereon a computer program that, when executed by a processor, implements the steps in each of the method embodiments described above.

当業者であれば、上記実施例の方法を実現するフローの全部又は一部は、コンピュータプログラムによって関連するハードウェアに指示することで実現されてもよく、上記コンピュータプログラムは、不揮発性コンピュータ可読記憶媒体に記憶されてもよく、このコンピュータプログラムが実行される際に、上記各方法の実施例のフローを含むことができることを理解するであろう。本願に係る各実施例において使用されるメモリ、ストレージ、データベース又は他の媒体への任意の参照でも、不揮発性及び揮発性メモリの少なくとも１つを含むことができる。不揮発性メモリは、読出し専用メモリ（Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ，ＲＯＭ）、磁気テープ、フロッピーディスク、フラッシュメモリ又は光メモリ等を含むことができる。揮発性メモリは、ランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ，ＲＡＭ）又は外部キャッシュメモリを含むことができる。ＲＡＭは、限定ではなく例として、スタティックランダムアクセスメモリ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ，ＳＲＡＭ）やダイナミックランダムアクセスメモリ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ，ＤＲＡＭ）等の様々な形態であってよい。 Those skilled in the art will understand that all or part of the flow for realizing the method of the above embodiments may be realized by instructing related hardware by a computer program, and the computer program may be implemented in a non-volatile computer readable storage. It will be appreciated that the computer program may be stored on a medium and, when executed, may include the flow of each of the method embodiments described above. Any references to memory, storage, databases, or other media used in embodiments of this application may include at least one of non-volatile and volatile memory. Nonvolatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. RAM may be in various forms, such as, by way of example and not limitation, Static Random Access Memory (SRAM) and Dynamic Random Access Memory (DRAM).

以上の実施例の各技術的特徴は、任意に組み合わせることが可能であり、説明を簡潔化するために、上記実施例における各技術的特徴の全ての可能な組み合わせについて説明していないが、これらの技術的特徴の組み合わせに矛盾が生じない限り、本明細書に記載される範囲と見なされるべきである。 The technical features of the above embodiments can be combined arbitrarily, and in order to simplify the explanation, all possible combinations of the technical features of the above embodiments have not been described. Unless a contradiction arises in the combination of technical features, it should be considered as the scope described in this specification.

以上の実施例は、本願のいくつかの実施形態を示したものに過ぎず、その説明が具体的で詳細であるが、本願の特許請求の範囲を限定するものとして理解されるべきではない。なお、当業者であれば、本願の趣旨から逸脱しない限り、様々な変形及び改良を行うことができ、それらも全て本願の保護範囲に含まれる。従って、本願の保護範囲は添付された特許請求の範囲に準じるべきである。 The above examples merely show some embodiments of the present application, and although the descriptions thereof are specific and detailed, they should not be understood as limiting the scope of the claims of the present application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the spirit of the present application, and all of these are included in the protection scope of the present application. Therefore, the scope of protection of the present application should be in accordance with the appended claims.

Claims

obtaining a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained is configured to train each corresponding image; The loss value includes a target classification loss value and a classification loss value corresponding to each of the branch neural networks, and the target classification loss value includes a plurality of branch neural networks for image recognition of the training target. a loss value for the sample image set of a model, the classification loss value corresponding to each branch neural network being a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. An image recognition model generation method comprising the steps of:

The step of training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained,
uniformly sampling the plurality of sample image subsets of the sample image set to obtain a sample image input sequence;
inputting a sample image into the trained image recognition model based on the sample image input sequence to obtain a recognized image class of the sample image;
2. Determining a loss value for the trained image recognition model based on a recognized image class of the sample image and a corresponding actual image class of the sample image. Method.

The image recognition model to be trained further includes a base neural network connected to the branch neural network,
The step of inputting a sample image into the trained image recognition model based on the sample image input sequence to obtain a recognized image class of the sample image includes:
The base neural network obtains a first image feature of the sample image, the branch neural network obtains a second image feature of the sample image based on the first image feature, and obtains a second image feature of the sample image based on the second image feature. 3. The method of claim 2, comprising inputting the sample images to the trained image recognition model to determine recognized image classes of sample images of the sample image set.

The step of determining a loss value of the image recognition model to be trained based on an image class of the sample image and a corresponding actual image class includes:
determining a loss value for a sample image of the sample image set based on a recognized image class of the sample image of the sample image set and an actual image class of the corresponding sample image ;
obtaining a loss value corresponding to the sample image set based on the loss value of the sample image of the sample image set determined by the plurality of branch neural networks, and setting it as the target classification loss value;
Obtain the loss values of all sample images of the sample image subset corresponding to each said branch neural network, and calculate the sum of the loss values of all sample images of said sample image subset as the classification loss value corresponding to each said branch neural network. the step of
The method according to claim 2 , comprising the step of calculating and obtaining a loss value of the image recognition model to be trained based on the target classification loss value and the classification loss value corresponding to each of the branch neural networks. .

The sample image set includes three sample image subsets in which the number of images decreases sequentially, and the image recognition model to be trained includes three branch neural networks;
The target classification loss value is calculated using the following formula:
L _f =J(F _net (X),Y), where F _net (X)=N _h+m+t (X)+N _m+t (X)+N _t (X)
(where L _f is the target classification loss value, J is the cross-entropy loss function, F _net is the image recognition model to be trained, and X is the sample image set in the sample image input sequence, Y is the actual image class of the sample image, h, m, t are the first, second and third sample image subsets with decreasing number of images, respectively, and the N _h+m+t , N _m+t , N _t are the three branch neural networks corresponding to the three sample image subsets, and the subscripts are the sample image subsets corresponding to the branch neural networks.
The classification loss value corresponding to each branch neural network is calculated using the following formula:
(where L _i is the sum of classification loss values corresponding to a plurality of said branch neural networks, S _m+t is one subset of 5. The method of claim 4, wherein S _t is the other subset of X and includes sample images belonging to a third sample image subset in the sample image input sequence.

6. The method according to claim 5, wherein the loss value of the image recognition model to be trained is calculated using the following formula.
L _all =(1-α)L _f /n ₁ +αL _i /n ₂ ;
(where L _all is the loss value of the image recognition model to be trained, α is the hyperparameter, n ₁ is the number of sample images in X, n ₂ is the sample image in X, S _{m + t} and S _t (This is the total number of images.)

Before getting the sample image set,
obtaining a sample image and determining the number of images of the image class based on the image class of the sample image;
obtaining a sorting order of the image classes based on the number of images in the image class, and dividing the image classes into a plurality of class combinations including the same number of image classes according to the sorting order;
Based on the plurality of class combinations and sample images corresponding to image classes in the plurality of class combinations, sample image subsets corresponding to the plurality of class combinations are obtained, and the combinations of the plurality of sample image subsets are combined into the sample images. 2. The method of claim 1, further comprising: creating a set of images.

an acquisition module for acquiring a sample image set including a plurality of sample image subsets each containing the same number of image classes and having a sequentially decreasing number of images;
A training module for training an image recognition model to be trained based on the sample image set to obtain a loss value of the image recognition model to be trained, wherein the image recognition model to be trained has a corresponding a plurality of branch neural networks for respectively recognizing images; the loss value includes a target classification loss value; and a classification loss value corresponding to each branch neural network; is a loss value for the sample image set of an image recognition model of , and the classification loss value corresponding to each branch neural network is a loss value for the sample image subset corresponding to the branch neural network of the corresponding branch neural network. training module;
Based on the loss value, adjust the model parameters of the image recognition model to be trained until the loss value becomes lower than a preset threshold, and make the image recognition model to be trained different from the trained image recognition model. An image recognition model generation device, comprising: an adjustment module for generating an image recognition model.

Computer equipment comprising a memory in which a computer program is stored and a processor which, when executing the computer program, implements the steps of the method according to any one of claims 1 to 7.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.