WO2023116507A1 - Procédé et appareil de formation de modèle de détection de cible, et procédé et appareil de détection de cible - Google Patents

Procédé et appareil de formation de modèle de détection de cible, et procédé et appareil de détection de cible Download PDF

Info

Publication number
WO2023116507A1
WO2023116507A1 PCT/CN2022/138650 CN2022138650W WO2023116507A1 WO 2023116507 A1 WO2023116507 A1 WO 2023116507A1 CN 2022138650 W CN2022138650 W CN 2022138650W WO 2023116507 A1 WO2023116507 A1 WO 2023116507A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detected
sub
vector
region
Prior art date
Application number
PCT/CN2022/138650
Other languages
English (en)
Chinese (zh)
Inventor
刘安
吕晶晶
张政
刘平
Original Assignee
北京沃东天骏信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京沃东天骏信息技术有限公司 filed Critical 北京沃东天骏信息技术有限公司
Publication of WO2023116507A1 publication Critical patent/WO2023116507A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the embodiments of the present disclosure provide a method for training a target detection model, a method and a device for target detection, which can more fully acquire global feature information in an image, enrich feature expressions, and have good detection performance for small targets.
  • the process of target detection is simplified, and the accuracy of target detection is improved.
  • a method for training a target detection model including:
  • a plurality of decoding vectors and image labels corresponding to the images are used for training to obtain the target detection model.
  • the method before obtaining the feature vector and position encoding vector corresponding to the image according to the image, the method further includes:
  • the image features of each sub-region of the image before extracting the image features of each sub-region of the image, it includes:
  • the backbone feature extraction network including the feature pyramid network to extract the image features of each sub-region to obtain the sub-region feature vector corresponding to each sub-region, and obtain the corresponding feature vector of the image according to the respective sub-region feature vectors corresponding to the image.
  • obtaining a decoding vector corresponding to the image according to the feature vector and the position encoding vector includes:
  • the position of the target includes the center coordinates, width and height of the target frame corresponding to the target, and the position loss is based on the sum of the intersection area of the predicted target frame and the real target frame
  • the loss obtained by the ratio of the set area, the category loss is the loss of the bipartite matching arrangement of the real target frame set and the predicted target frame set
  • the shared feedforward network is composed of Relu activation function, multi-layer perceptron and linear layer.
  • Another aspect of the embodiments of the present disclosure provides a method for target detection, including:
  • the image to be detected is input into a trained target detection model, and the position and category of the target in the detected image are determined.
  • the target detection model is obtained according to the training method of the target detection model in the embodiment of the present disclosure.
  • determining the position and category of the target in the image to be detected includes:
  • the decoding vector corresponding to the image to be detected is input to the shared feedforward network to obtain the position and category of the object in the image to be detected.
  • obtaining the feature vector and position encoding vector corresponding to the image to be detected includes:
  • a position encoding vector corresponding to the image to be detected is obtained according to the image features of each sub-region, and the position encoding vector includes a positional relationship between image features of each sub-region corresponding to the image to be detected.
  • the foreground area is divided according to different scales to obtain multiple sub-areas corresponding to the image to be detected.
  • the image features of each sub-region are extracted, and the feature vector corresponding to the image to be detected is obtained according to the image features of each sub-region, including:
  • the backbone feature extraction network including the feature pyramid network to extract the image features of each sub-region, obtain the sub-region feature vector corresponding to each sub-region, and obtain the corresponding sub-region feature vector of the image to be detected according to each sub-region feature vector corresponding to the image to be detected Feature vector.
  • obtaining a decoding vector corresponding to the image to be detected according to a feature vector and a position encoding vector corresponding to the image to be detected includes:
  • a training device for a target detection model including:
  • An acquisition module that acquires a plurality of images and image tags corresponding to each of the images, where the image tags include the position and category of the target in the image;
  • the first determining module obtains a feature vector and a position encoding vector corresponding to the image according to the image;
  • the training module uses decoding vectors and image labels corresponding to multiple images for training to obtain the target detection model.
  • a target detection device including:
  • the determining module is configured to input the image to be detected into a trained target detection model, and determine the location and category of the target in the image to be detected.
  • the target detection model is obtained according to the training method of the target detection model in an embodiment of the present disclosure.
  • an electronic device including:
  • processors one or more processors
  • the one or more processors are made to implement the object detection model training method or the object detection method provided in the present disclosure.
  • a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method for training a target detection model or the method for target detection provided by the present disclosure is implemented .
  • FIG. 1 is a schematic diagram of the main flow of a method for target detection according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of the main flow of a method for training a target detection model according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a target detection model according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of main modules of a training device for a target detection model according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of main modules of a device for target detection according to an embodiment of the present disclosure.
  • FIG. 7 is an exemplary system architecture diagram to which embodiments of the present disclosure can be applied.
  • Fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure uses the target detection model to predict the position and category of the target in the image to be detected, and the training process of the target detection model includes: acquiring multiple images and each image The image label, and then obtain the feature vector and position encoding vector of the image, obtain the decoding vector according to the feature vector and position encoding vector, use the decoding vector and image label to train the target detection model, the target detection model is based on the convolutional neural network to extract local features , the encoding-decoding structural model based on the self-attention mechanism learns global features, which makes the target detection model more generalizable.
  • the object detection method in the embodiment of the present disclosure can improve the accuracy of object detection and simplify the process of object detection.
  • FIG. 1 is a schematic diagram of a main flow of a method for target detection according to an embodiment of the present disclosure. As shown in FIG. 1 , the method for target detection includes the following steps:
  • Step S101 acquiring an image to be detected
  • Step S102 Input the image to be detected into the trained object detection model, and determine the location and category of the object in the image to be detected.
  • the training method of the target detection model includes:
  • Step S201 Obtain multiple images and image tags corresponding to each image
  • Step S202 Obtain the feature vector and position encoding vector corresponding to the image according to the image;
  • Step S203 Obtain a decoding vector corresponding to the image according to the feature vector and the position encoding vector;
  • Step S204 training by using decoding vectors and image labels corresponding to multiple images to obtain a target detection model.
  • the image to be detected in the embodiment of the present disclosure may be an image containing a target, such as a picture of a product in an e-commerce scene, and the product in the picture of the product is the target in the image to be detected.
  • the steps include:
  • Step S301 selecting a part of images and an image label corresponding to each image from a plurality of images and image labels corresponding to each image;
  • Step S302 training an image detection model according to part of the images and the image labels corresponding to each image;
  • Step S303 Use the image detection model to perform data cleaning on multiple images and image labels, determine image labels to be relabeled, and relabel the image labels to be relabeled.
  • Data augmentation is performed on images and image labels of some categories according to the number of images corresponding to each category.
  • data preprocessing is performed on the images and image labels, and the data preprocessing includes data cleaning and data enhancement.
  • data cleaning is mainly to select some images and image labels corresponding to some images from multiple images and image labels corresponding to each image to construct a training set, among which, some
  • the image label corresponding to the image and part of the image can be obtained in the following way: randomly select a certain proportion (10%) of data from multiple images and the image label corresponding to each image, and then manually check whether the image and the label corresponding to the image are Match, remove unmatched images and image labels, and construct a training set based on the remaining images and image labels corresponding to the images.
  • the image detection model can be a model of the YOLO series (such as YOLOv5), and then use the trained image detection model to perform data cleaning on multiple images and the image labels corresponding to each image, specifically Specifically, multiple images are input into the image detection model to obtain an output result, the output result includes the center coordinates of the target frame and the category of the target, and the probability that the image detection model judges that the target belongs to this category, and the probability is used as the confidence level, Adjusting a higher confidence level can filter out a small number of image labels that are quite different from image labeling, that is, image labels to be relabeled. By relabeling the relabeled image labels, the workload of data cleaning can be greatly reduced. Implement data cleansing.
  • the image corresponding to the image is obtained according to the image
  • the eigenvectors and position encoding vectors of including:
  • a position encoding vector corresponding to the image is obtained according to the image features of each sub-region, and the position encoding vector includes a positional relationship between image features of each sub-region corresponding to the image.
  • the method before extracting the sub-region feature vector corresponding to each sub-region of the image, includes: using a saliency detection model to detect the foreground region of the image; dividing the foreground region according to different scales to obtain multiple sub-regions corresponding to the image.
  • the multiple sub-regions after the foreground area is divided according to different scales include the entire foreground area and the sizes of the entire foreground area divided according to different sizes.
  • the rich semantic information and location information of the image can be obtained by dividing according to different scales.
  • the saliency detection model can be a PFANet model
  • the sub-region can be a patch block.
  • the patch block is divided according to different scales for the foreground area, so that each image can obtain multiple different Scale patch blocks, for example, divide the foreground area of the image evenly according to the number of blocks of 1*1, 3*3, and 5*5, and 35 patch blocks corresponding to the image can be obtained.
  • the background that does not contain the target can be removed by performing foreground area detection on the image, and finer-grained image features can be obtained by sub-region division such as patch block division.
  • the image features of each sub-region are extracted, and the method for extracting image features may be the Faster RCNN extraction method or the backbone network feature extraction method. Further, for the images in the e-commerce scene, the backbone network feature extraction method is used to extract the image features of each sub-region.
  • the backbone network extraction method is used to extract the image features of each sub-region of the image, that is, the image features of each patch block of the image.
  • the backbone feature extraction network is the residual network ResNet 50.
  • the backbone feature extraction network including the feature pyramid network (FPN) is used to extract the image features of each sub-region, and each sub-region corresponds to The feature vectors of the sub-regions of the image are obtained according to the feature vectors of each sub-region of the image.
  • the feature vectors of the image contain both rich semantic information and accurate position information.
  • the feature vector of the image is obtained by splicing the feature vectors of each sub-region of the image.
  • the obtaining module 501 is further configured to: select a partial image and each The image label corresponding to the image; the image detection model is obtained according to the training of part of the image and the image label corresponding to each image; the image detection model is used to clean the data of multiple images and image labels, and determine the image label to be relabeled, which is to be relabeled The label of the image is relabeled.
  • the second determination module 503 is further configured to: obtain the fusion feature vector according to the feature vector and the position encoding vector; perform feature encoding and feature decoding on the fusion feature vector based on the model of the self-attention mechanism, and obtain the decoding vector .
  • terminal devices, networks and servers in FIG. 8 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation se rapporte au domaine technique des ordinateurs, et concerne un procédé et un appareil de formation de modèle de détection de cible, ainsi qu'un procédé et un appareil de détection de cible. Une mise en œuvre spécifique du procédé consiste à : obtenir une pluralité d'images et une étiquette d'image correspondant à chaque image, l'étiquette d'image comprenant une position et une catégorie d'une cible dans l'image ; obtenir, en fonction de chaque image, un vecteur de caractéristique et un vecteur de codage de position correspondant à l'image, et obtenir, en fonction du vecteur de caractéristique et du vecteur de codage de position, un vecteur de décodage correspondant à l'image ; utiliser les vecteurs de décodage et les étiquettes d'image correspondant à la pluralité d'images pour formation afin d'obtenir un modèle de détection de cible ; puis, utiliser le modèle de détection de cible pour prédire une position et une catégorie d'une cible dans une image à détecter. Lors de la mise en œuvre, la position et la catégorie de la cible dans l'image sont détectées au moyen d'un réseau neuronal convolutif en combinaison avec un mécanisme d'auto-attention, ce qui permet d'améliorer la précision de détection de cible, et de simplifier le processus de détection de cible.
PCT/CN2022/138650 2021-12-22 2022-12-13 Procédé et appareil de formation de modèle de détection de cible, et procédé et appareil de détection de cible WO2023116507A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111582867.9 2021-12-22
CN202111582867.9A CN114399629A (zh) 2021-12-22 2021-12-22 一种目标检测模型的训练方法、目标检测的方法和装置

Publications (1)

Publication Number Publication Date
WO2023116507A1 true WO2023116507A1 (fr) 2023-06-29

Family

ID=81226887

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138650 WO2023116507A1 (fr) 2021-12-22 2022-12-13 Procédé et appareil de formation de modèle de détection de cible, et procédé et appareil de détection de cible

Country Status (2)

Country Link
CN (1) CN114399629A (fr)
WO (1) WO2023116507A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862244A (zh) * 2023-09-04 2023-10-10 广东鉴面智能科技有限公司 一种工业现场视觉ai分析与安全预警系统和方法
CN117152422A (zh) * 2023-10-31 2023-12-01 国网湖北省电力有限公司超高压公司 一种紫外图像无锚框目标检测方法及存储介质、电子设备
CN117274575A (zh) * 2023-09-28 2023-12-22 北京百度网讯科技有限公司 目标检测模型的训练方法、目标检测方法、装置和设备
CN117496131A (zh) * 2023-12-29 2024-02-02 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117671801A (zh) * 2024-02-02 2024-03-08 中科方寸知微(南京)科技有限公司 基于二分缩减的实时目标检测方法及系统
CN117854138A (zh) * 2024-03-07 2024-04-09 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质
CN117952977A (zh) * 2024-03-27 2024-04-30 山东泉海汽车科技有限公司 一种基于改进yolov5s的路面裂缝识别方法、装置和介质
CN117953206A (zh) * 2024-03-25 2024-04-30 厦门大学 一种基于点标注指引的混合监督目标检测方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399629A (zh) * 2021-12-22 2022-04-26 北京沃东天骏信息技术有限公司 一种目标检测模型的训练方法、目标检测的方法和装置
CN114972220B (zh) * 2022-05-13 2023-02-21 北京医准智能科技有限公司 一种图像处理方法、装置、电子设备及可读存储介质
CN114926655B (zh) * 2022-05-20 2023-09-26 北京百度网讯科技有限公司 地理与视觉跨模态预训练模型的训练方法、位置确定方法
CN114707561B (zh) * 2022-05-25 2022-09-30 清华大学深圳国际研究生院 Psg数据自动分析方法、装置、计算机设备以及存储介质
CN114937086B (zh) * 2022-07-19 2022-11-01 北京鹰瞳科技发展股份有限公司 多图像目标检测的训练方法、检测方法及相关产品
CN115950993B (zh) * 2023-03-15 2023-07-25 福建德尔科技股份有限公司 氟氮混合气中氟含量的测试方法
CN117315651B (zh) * 2023-09-13 2024-06-14 深圳市大数据研究院 基于仿射一致Transformer的多类别细胞检测分类方法以及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305296A (zh) * 2017-08-30 2018-07-20 深圳市腾讯计算机系统有限公司 图像描述生成方法、模型训练方法、设备和存储介质
CN110472688A (zh) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 图像描述的方法及装置、图像描述模型的训练方法及装置
WO2021147257A1 (fr) * 2020-01-20 2021-07-29 上海商汤智能科技有限公司 Procédé et appareil d'apprentissage de réseau, procédé et appareil de traitement d'images et dispositif électronique et support de stockage
CN113807361A (zh) * 2021-08-11 2021-12-17 华为技术有限公司 神经网络、目标检测方法、神经网络训练方法及相关产品
CN114399629A (zh) * 2021-12-22 2022-04-26 北京沃东天骏信息技术有限公司 一种目标检测模型的训练方法、目标检测的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305296A (zh) * 2017-08-30 2018-07-20 深圳市腾讯计算机系统有限公司 图像描述生成方法、模型训练方法、设备和存储介质
CN110472688A (zh) * 2019-08-16 2019-11-19 北京金山数字娱乐科技有限公司 图像描述的方法及装置、图像描述模型的训练方法及装置
WO2021147257A1 (fr) * 2020-01-20 2021-07-29 上海商汤智能科技有限公司 Procédé et appareil d'apprentissage de réseau, procédé et appareil de traitement d'images et dispositif électronique et support de stockage
CN113807361A (zh) * 2021-08-11 2021-12-17 华为技术有限公司 神经网络、目标检测方法、神经网络训练方法及相关产品
CN114399629A (zh) * 2021-12-22 2022-04-26 北京沃东天骏信息技术有限公司 一种目标检测模型的训练方法、目标检测的方法和装置

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116862244B (zh) * 2023-09-04 2024-03-22 广东鉴面智能科技有限公司 一种工业现场视觉ai分析与安全预警系统和方法
CN116862244A (zh) * 2023-09-04 2023-10-10 广东鉴面智能科技有限公司 一种工业现场视觉ai分析与安全预警系统和方法
CN117274575A (zh) * 2023-09-28 2023-12-22 北京百度网讯科技有限公司 目标检测模型的训练方法、目标检测方法、装置和设备
CN117152422A (zh) * 2023-10-31 2023-12-01 国网湖北省电力有限公司超高压公司 一种紫外图像无锚框目标检测方法及存储介质、电子设备
CN117152422B (zh) * 2023-10-31 2024-02-13 国网湖北省电力有限公司超高压公司 一种紫外图像无锚框目标检测方法及存储介质、电子设备
CN117496131A (zh) * 2023-12-29 2024-02-02 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117496131B (zh) * 2023-12-29 2024-05-10 国网山东省电力公司济南供电公司 一种电力作业现场安全行为识别方法及系统
CN117671801A (zh) * 2024-02-02 2024-03-08 中科方寸知微(南京)科技有限公司 基于二分缩减的实时目标检测方法及系统
CN117671801B (zh) * 2024-02-02 2024-04-23 中科方寸知微(南京)科技有限公司 基于二分缩减的实时目标检测方法及系统
CN117854138A (zh) * 2024-03-07 2024-04-09 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质
CN117854138B (zh) * 2024-03-07 2024-05-10 深圳航天信息有限公司 基于大数据的信息采集分析方法、装置、设备及存储介质
CN117953206A (zh) * 2024-03-25 2024-04-30 厦门大学 一种基于点标注指引的混合监督目标检测方法及装置
CN117952977A (zh) * 2024-03-27 2024-04-30 山东泉海汽车科技有限公司 一种基于改进yolov5s的路面裂缝识别方法、装置和介质
CN117952977B (zh) * 2024-03-27 2024-06-04 山东泉海汽车科技有限公司 一种基于改进yolov5s的路面裂缝识别方法、装置和介质

Also Published As

Publication number Publication date
CN114399629A (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2023116507A1 (fr) Procédé et appareil de formation de modèle de détection de cible, et procédé et appareil de détection de cible
WO2022105125A1 (fr) Procédé et appareil de segmentation d'image, dispositif informatique et support de stockage
JP7394809B2 (ja) ビデオを処理するための方法、装置、電子機器、媒体及びコンピュータプログラム
CN115578735B (zh) 文本检测方法和文本检测模型的训练方法、装置
CN110633594A (zh) 一种目标检测方法和装置
CN114663952A (zh) 对象分类方法、深度学习模型的训练方法、装置和设备
CN110633717A (zh) 一种目标检测模型的训练方法和装置
CN114495102A (zh) 文本识别方法、文本识别网络的训练方法及装置
CN113887615A (zh) 图像处理方法、装置、设备和介质
CN113869205A (zh) 对象检测方法、装置、电子设备和存储介质
CN114898266A (zh) 训练方法、图像处理方法、装置、电子设备以及存储介质
CN112182281A (zh) 一种音频推荐方法、装置及存储介质
CN114120172A (zh) 基于视频的目标检测方法、装置、电子设备及存储介质
CN113962737A (zh) 目标识别模型训练方法和装置、目标识别方法和装置
CN115565177B (zh) 文字识别模型训练、文字识别方法、装置、设备及介质
CN111368693A (zh) 一种身份证信息的识别方法和装置
CN115984868A (zh) 文本处理方法、装置、介质及设备
CN113177483B (zh) 视频目标分割方法、装置、设备以及存储介质
CN115565186A (zh) 文字识别模型的训练方法、装置、电子设备和存储介质
CN113239215B (zh) 多媒体资源的分类方法、装置、电子设备及存储介质
JP2022133474A (ja) テキストの認識の方法、装置、電子機器、記憶媒体およびコンピュータプログラム
CN114549904A (zh) 视觉处理及模型训练方法、设备、存储介质及程序产品
CN113742485A (zh) 一种处理文本的方法和装置
CN114612971A (zh) 人脸检测方法、模型训练方法、电子设备及程序产品
CN114419327A (zh) 图像检测方法和图像检测模型的训练方法、装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909821

Country of ref document: EP

Kind code of ref document: A1