WO2023011093A1 - Procédé et appareil d'apprentissage de modèle de tâche, et dispositif électronique et support de stockage - Google Patents

Procédé et appareil d'apprentissage de modèle de tâche, et dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2023011093A1
WO2023011093A1 PCT/CN2022/104081 CN2022104081W WO2023011093A1 WO 2023011093 A1 WO2023011093 A1 WO 2023011093A1 CN 2022104081 W CN2022104081 W CN 2022104081W WO 2023011093 A1 WO2023011093 A1 WO 2023011093A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
sample
samples
test set
test
Prior art date
Application number
PCT/CN2022/104081
Other languages
English (en)
Chinese (zh)
Inventor
杨德将
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023011093A1 publication Critical patent/WO2023011093A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • an electronic device including:
  • FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure.
  • the original labels of the training samples in the training set and the test samples in the testing set can be removed, and a first label such as 0 is configured for all training samples in the training set to identify that these training samples are all samples in the training set. Configure a second label such as 1 for all test samples in the test set to identify that these samples are all samples in the test set.
  • the combined sample set can be randomly split to obtain a new training set and a new test set.
  • training samples with high weights are more likely to be selected to participate in training, which can make the task model more inclined to learn training samples with high weights, that is, training samples that are more similar to the test set. It can overcome the problem of training set and sample set distribution offset.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente divulgation se rapporte au domaine technique de l'intelligence artificielle telle que l'apprentissage automatique et le traitement de langage naturel. Elle concerne un procédé et un appareil d'apprentissage de modèle de tâche, ainsi qu'un dispositif électronique et un support de stockage. La solution de mise en œuvre spécifique consiste à : acquérir les similarités entre des échantillons d'apprentissage d'un ensemble d'apprentissage et un ensemble d'essai; configurer les poids des échantillons d'apprentissage correspondants en fonction des similarités entre les échantillons d'apprentissage de l'ensemble d'apprentissage et de l'ensemble d'essai; et apprendre un modèle de tâche en fonction des échantillons d'apprentissage de l'ensemble d'apprentissage et des poids des échantillons d'apprentissage correspondants. Au moyen de la présente divulgation, la précision d'un modèle de tâche appris peut être améliorée efficacement.
PCT/CN2022/104081 2021-08-04 2022-07-06 Procédé et appareil d'apprentissage de modèle de tâche, et dispositif électronique et support de stockage WO2023011093A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110891285.2 2021-08-04
CN202110891285.2A CN113807391A (zh) 2021-08-04 2021-08-04 任务模型的训练方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023011093A1 true WO2023011093A1 (fr) 2023-02-09

Family

ID=78893267

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104081 WO2023011093A1 (fr) 2021-08-04 2022-07-06 Procédé et appareil d'apprentissage de modèle de tâche, et dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN113807391A (fr)
WO (1) WO2023011093A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807391A (zh) * 2021-08-04 2021-12-17 北京百度网讯科技有限公司 任务模型的训练方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078359A1 (en) * 2014-09-12 2016-03-17 Xerox Corporation System for domain adaptation with a domain-specific class means classifier
CN105574547A (zh) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 适应动态调整基分类器权重的集成学习方法及装置
CN110515836A (zh) * 2019-07-31 2019-11-29 杭州电子科技大学 一种面向软件缺陷预测的加权朴素贝叶斯方法
CN113807391A (zh) * 2021-08-04 2021-12-17 北京百度网讯科技有限公司 任务模型的训练方法、装置、电子设备及存储介质
CN114187979A (zh) * 2022-02-15 2022-03-15 北京晶泰科技有限公司 数据处理、模型训练、分子预测和筛选方法及其装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078359A1 (en) * 2014-09-12 2016-03-17 Xerox Corporation System for domain adaptation with a domain-specific class means classifier
CN105574547A (zh) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 适应动态调整基分类器权重的集成学习方法及装置
CN110515836A (zh) * 2019-07-31 2019-11-29 杭州电子科技大学 一种面向软件缺陷预测的加权朴素贝叶斯方法
CN113807391A (zh) * 2021-08-04 2021-12-17 北京百度网讯科技有限公司 任务模型的训练方法、装置、电子设备及存储介质
CN114187979A (zh) * 2022-02-15 2022-03-15 北京晶泰科技有限公司 数据处理、模型训练、分子预测和筛选方法及其装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG LINGDI, XU HUA: "An Adaptive Ensemble Algorithm Based on Clustering and AdaBoost", JILIN DAXUE XUEBAO (LIXUE BAN) - UNIVERSITY. JOURNAL (SCIENCE EDITION), JILIN DAXUE CHUBANSHE, CHANGCHUN, CN, vol. 56, no. 4, 26 July 2018 (2018-07-26), CN , pages 917 - 924, XP093031569, ISSN: 1671-5489, DOI: 10.13413/j.cnki.jdxblxb.2018.04.25 *
ZENG XI, LING-JUN KONG, WEN-JIE ZHAN: "Spectral Reflectance Reconstruction Based on Vector Angle Sample Selection", BAOZHUANG GONGCHENG - PACKAGING ENGINEERING, ZHONGGUO BINGQI GONGYE DI-59 YANJIUSUO, CN, vol. 39, no. 15, 10 August 2018 (2018-08-10), CN , pages 216 - 220, XP093031576, ISSN: 1001-3563, DOI: 10.19554/j.cnki.1001-3563.2018.15.034 *

Also Published As

Publication number Publication date
CN113807391A (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
EP3955204A1 (fr) Procédé et appareil de traitement de données, dispositif électronique et support de mémoire
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN115082920A (zh) 深度学习模型的训练方法、图像处理方法和装置
CN113657483A (zh) 模型训练方法、目标检测方法、装置、设备以及存储介质
WO2023011093A1 (fr) Procédé et appareil d'apprentissage de modèle de tâche, et dispositif électronique et support de stockage
CN113392920B (zh) 生成作弊预测模型的方法、装置、设备、介质及程序产品
CN114821063A (zh) 语义分割模型的生成方法及装置、图像的处理方法
CN114462598A (zh) 深度学习模型的训练方法、确定数据类别的方法和装置
CN116342164A (zh) 目标用户群体的定位方法、装置、电子设备及存储介质
CN114692778A (zh) 用于智能巡检的多模态样本集生成方法、训练方法及装置
CN114078274A (zh) 人脸图像检测方法、装置、电子设备以及存储介质
CN114492364A (zh) 相同漏洞的判断方法、装置、设备和存储介质
CN114417822A (zh) 用于生成模型解释信息的方法、装置、设备、介质和产品
CN114067805A (zh) 声纹识别模型的训练与声纹识别方法及装置
CN114238611A (zh) 用于输出信息的方法、装置、设备以及存储介质
CN114021642A (zh) 数据处理方法、装置、电子设备和存储介质
CN113806541A (zh) 情感分类的方法和情感分类模型的训练方法、装置
CN111325350A (zh) 可疑组织发现系统和方法
US20240037410A1 (en) Method for model aggregation in federated learning, server, device, and storage medium
CN114844889B (zh) 视频处理模型的更新方法、装置、电子设备及存储介质
CN114066278B (zh) 物品召回的评估方法、装置、介质及程序产品
CN113408664B (zh) 训练方法、分类方法、装置、电子设备以及存储介质
CN114547448B (zh) 数据处理、模型训练方法、装置、设备、存储介质及程序
US20220385583A1 (en) Traffic classification and training of traffic classifier
CN115471717B (zh) 模型的半监督训练、分类方法装置、设备、介质及产品

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE