CA3225020A1 - Automatic labeling of text data - Google Patents

Automatic labeling of text data Download PDF

Info

Publication number
CA3225020A1
CA3225020A1 CA3225020A CA3225020A CA3225020A1 CA 3225020 A1 CA3225020 A1 CA 3225020A1 CA 3225020 A CA3225020 A CA 3225020A CA 3225020 A CA3225020 A CA 3225020A CA 3225020 A1 CA3225020 A1 CA 3225020A1
Authority
CA
Canada
Prior art keywords
label
text
candidate
search
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3225020A
Other languages
English (en)
French (fr)
Inventor
Mohit Sewak
Ravi Kiran Reddy Poluri
William Blum
Pak On CHAN
Weisheng Li
Sharada Shirish Acharya
Christian Rudnick
Michael Abraham Betser
Milenko Drinic
Sihong LIU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/711,506 external-priority patent/US12197486B2/en
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of CA3225020A1 publication Critical patent/CA3225020A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
CA3225020A 2021-06-29 2022-05-23 Automatic labeling of text data Pending CA3225020A1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
IN202141029147 2021-06-29
IN202141029147 2021-06-29
US17/711,506 2022-04-01
US17/711,506 US12197486B2 (en) 2021-06-29 2022-04-01 Automatic labeling of text data
PCT/US2022/030464 WO2023278070A1 (en) 2021-06-29 2022-05-23 Automatic labeling of text data

Publications (1)

Publication Number Publication Date
CA3225020A1 true CA3225020A1 (en) 2023-01-05

Family

ID=82156528

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3225020A Pending CA3225020A1 (en) 2021-06-29 2022-05-23 Automatic labeling of text data

Country Status (9)

Country Link
US (1) US20240370484A1 (enExample)
EP (1) EP4364000A1 (enExample)
JP (1) JP2024524060A (enExample)
KR (1) KR20240023535A (enExample)
AU (1) AU2022304683A1 (enExample)
BR (1) BR112023027439A2 (enExample)
CA (1) CA3225020A1 (enExample)
WO (1) WO2023278070A1 (enExample)
ZA (1) ZA202400308B (enExample)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628556A (zh) * 2023-06-14 2023-08-22 上海桥创科技有限公司 一种产品标签的建立方法及其建立系统
CN120430300A (zh) * 2025-07-09 2025-08-05 中国民用航空飞行学院 一种航行通告文本自动纠错方法、系统、存储介质及终端

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230385966A1 (en) * 2022-05-31 2023-11-30 Docusign, Inc. Predictive text for contract generation in a document management system
US20240054285A1 (en) * 2022-08-10 2024-02-15 TOTVS, Inc. Sentence pair ranking in natural language processing for a virtual assistant
CN116415154B (zh) * 2023-06-12 2023-08-22 江西五十铃汽车有限公司 一种基于gpt的车辆故障解决方案生成方法及装置
JP2025036355A (ja) * 2023-08-30 2025-03-14 宏達國際電子股▲ふん▼有限公司 外れた文字データをスクリーニングするためのデータ分類方法
CN116910279B (zh) * 2023-09-13 2024-01-05 深圳市智慧城市科技发展集团有限公司 标签提取方法、设备及计算机可读存储介质
CN121970062A (zh) * 2023-10-24 2026-05-01 株式会社半导体能源研究所 信息处理系统、信息处理方法
KR102763213B1 (ko) * 2024-04-04 2025-02-07 주식회사 리턴제로 도메인에 따른 템플릿 기반 데이터 라벨링을 수행하는 전자 장치 및 방법
US12530377B2 (en) 2024-05-22 2026-01-20 Shopify Inc. Additional searching based on confidence in a classification performed by a generative language machine learning model
CN118689468A (zh) * 2024-06-19 2024-09-24 北京百度网讯科技有限公司 基于大模型的代码生成方法、装置、电子设备及存储介质
KR102823763B1 (ko) * 2024-12-10 2025-06-23 한화시스템 주식회사 문장 구문 해석 기반 전투체계 데이터 생성 시스템 및 방법
CN120541194B (zh) * 2025-07-25 2025-10-24 浪潮通用软件有限公司 基于多维标签的知识检索方法、系统及计算机设备
CN121303112A (zh) * 2025-09-28 2026-01-09 北京首发展智能科技有限公司 一种基于llm模型的标签获取方法、设备及介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635727B2 (en) * 2016-08-16 2020-04-28 Ebay Inc. Semantic forward search indexing of publication corpus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628556A (zh) * 2023-06-14 2023-08-22 上海桥创科技有限公司 一种产品标签的建立方法及其建立系统
CN120430300A (zh) * 2025-07-09 2025-08-05 中国民用航空飞行学院 一种航行通告文本自动纠错方法、系统、存储介质及终端

Also Published As

Publication number Publication date
KR20240023535A (ko) 2024-02-22
AU2022304683A1 (en) 2024-01-04
JP2024524060A (ja) 2024-07-05
WO2023278070A1 (en) 2023-01-05
ZA202400308B (en) 2025-10-29
BR112023027439A2 (pt) 2024-03-12
US20240370484A1 (en) 2024-11-07
EP4364000A1 (en) 2024-05-08

Similar Documents

Publication Publication Date Title
US12197486B2 (en) Automatic labeling of text data
US20240370484A1 (en) Automatic labeling of text data
CN112800170B (zh) 问题的匹配方法及装置、问题的回复方法及装置
CN110297868B (zh) 构建企业特定知识图
CN101523338B (zh) 应用来自用户的反馈来改进搜索结果的搜索引擎
US11048705B2 (en) Query intent clustering for automated sourcing
CN106055549B (zh) 利用加速器的概念分析操作的方法和系统
CN118132719A (zh) 一种基于自然语言处理的智能对话方法及系统
JP5391633B2 (ja) オントロジー空間を規定するタームの推奨
US11017040B2 (en) Providing query explanations for automated sourcing
US20180232434A1 (en) Proactive and retrospective joint weight attribution in a streaming environment
CN112507715A (zh) 确定实体之间关联关系的方法、装置、设备和存储介质
US20180232702A1 (en) Using feedback to re-weight candidate features in a streaming environment
US20060242130A1 (en) Information retrieval using conjunctive search and link discovery
CN109829104A (zh) 基于语义相似度的伪相关反馈模型信息检索方法及系统
US20170371965A1 (en) Method and system for dynamically personalizing profiles in a social network
CN108090231A (zh) 一种基于信息熵的主题模型优化方法
US20170169355A1 (en) Ground Truth Improvement Via Machine Learned Similar Passage Detection
US20210319066A1 (en) Sub-Question Result Merging in Question and Answer (QA) Systems
JP2014120053A (ja) 質問応答装置、方法、及びプログラム
US12406008B1 (en) Using intent-based rankings to generate large language model responses
CN113239071A (zh) 面向科技资源学科及研究主题信息的检索查询方法及系统
CN118626611A (zh) 检索的方法、装置、电子设备及可读存储介质
CN115391479B (zh) 用于文档搜索的排序方法、装置、电子介质及存储介质
Jiang et al. Understanding a bag of words by conceptual labeling with prior weights

Legal Events

Date Code Title Description
D00 Search and/or examination requested or commenced

Free format text: ST27 STATUS EVENT CODE: A-1-1-D10-D00-D120 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: VOLUNTARY SUBMISSION OF PRIOR ART RECEIVED

Effective date: 20240715

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 3RD ANNIV.) - STANDARD

Year of fee payment: 3

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20250425

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20250425

D00 Search and/or examination requested or commenced

Free format text: ST27 STATUS EVENT CODE: A-1-1-D10-D00-D123 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: PRIOR ART DISCLOSURE DETERMINED COMPLIANT

Effective date: 20250509

W00 Other event occurred

Free format text: ST27 STATUS EVENT CODE: A-1-1-W10-W00-W111 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: CORRESPONDENT DETERMINED COMPLIANT

Effective date: 20250509

Free format text: ST27 STATUS EVENT CODE: A-1-1-W10-W00-W100 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: LETTER SENT

Effective date: 20250509

MFA Maintenance fee for application paid

Free format text: FEE DESCRIPTION TEXT: MF (APPLICATION, 4TH ANNIV.) - STANDARD

Year of fee payment: 4

U00 Fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U00-U101 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE REQUEST RECEIVED

Effective date: 20260421

U11 Full renewal or maintenance fee paid

Free format text: ST27 STATUS EVENT CODE: A-1-1-U10-U11-U102 (AS PROVIDED BY THE NATIONAL OFFICE); EVENT TEXT: MAINTENANCE FEE PAYMENT PAID IN FULL

Effective date: 20260421