ZA202400308B - Automatic labeling of text data - Google Patents

Automatic labeling of text data

Info

Publication number
ZA202400308B
ZA202400308B ZA2024/00308A ZA202400308A ZA202400308B ZA 202400308 B ZA202400308 B ZA 202400308B ZA 2024/00308 A ZA2024/00308 A ZA 2024/00308A ZA 202400308 A ZA202400308 A ZA 202400308A ZA 202400308 B ZA202400308 B ZA 202400308B
Authority
ZA
South Africa
Prior art keywords
label
candidate text
text
technology
produce
Prior art date
Application number
ZA2024/00308A
Other languages
English (en)
Inventor
Sewak Mohit
Kiran Reddy Poluri Ravi
Blum William
On Chan Pak
Li Weisheng
Shirish Acharya Sharada
Rudnick Christian
Abraham Betser Michael
Drinic Milenko
Liu Sihong
Original Assignee
Microsoft Technology Licensing Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/711,506 external-priority patent/US12197486B2/en
Application filed by Microsoft Technology Licensing Llc filed Critical Microsoft Technology Licensing Llc
Publication of ZA202400308B publication Critical patent/ZA202400308B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)
ZA2024/00308A 2021-06-29 2024-01-09 Automatic labeling of text data ZA202400308B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
IN202141029147 2021-06-29
US17/711,506 US12197486B2 (en) 2021-06-29 2022-04-01 Automatic labeling of text data
PCT/US2022/030464 WO2023278070A1 (en) 2021-06-29 2022-05-23 Automatic labeling of text data

Publications (1)

Publication Number Publication Date
ZA202400308B true ZA202400308B (en) 2025-10-29

Family

ID=82156528

Family Applications (1)

Application Number Title Priority Date Filing Date
ZA2024/00308A ZA202400308B (en) 2021-06-29 2024-01-09 Automatic labeling of text data

Country Status (9)

Country Link
US (1) US20240370484A1 (enExample)
EP (1) EP4364000A1 (enExample)
JP (1) JP2024524060A (enExample)
KR (1) KR20240023535A (enExample)
AU (1) AU2022304683A1 (enExample)
BR (1) BR112023027439A2 (enExample)
CA (1) CA3225020A1 (enExample)
WO (1) WO2023278070A1 (enExample)
ZA (1) ZA202400308B (enExample)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230385966A1 (en) * 2022-05-31 2023-11-30 Docusign, Inc. Predictive text for contract generation in a document management system
US20240054285A1 (en) * 2022-08-10 2024-02-15 TOTVS, Inc. Sentence pair ranking in natural language processing for a virtual assistant
CN116415154B (zh) * 2023-06-12 2023-08-22 江西五十铃汽车有限公司 一种基于gpt的车辆故障解决方案生成方法及装置
JP2025036355A (ja) * 2023-08-30 2025-03-14 宏達國際電子股▲ふん▼有限公司 外れた文字データをスクリーニングするためのデータ分類方法
CN116910279B (zh) * 2023-09-13 2024-01-05 深圳市智慧城市科技发展集团有限公司 标签提取方法、设备及计算机可读存储介质
CN121970062A (zh) * 2023-10-24 2026-05-01 株式会社半导体能源研究所 信息处理系统、信息处理方法
KR102763213B1 (ko) * 2024-04-04 2025-02-07 주식회사 리턴제로 도메인에 따른 템플릿 기반 데이터 라벨링을 수행하는 전자 장치 및 방법
US12530377B2 (en) 2024-05-22 2026-01-20 Shopify Inc. Additional searching based on confidence in a classification performed by a generative language machine learning model
CN118689468A (zh) * 2024-06-19 2024-09-24 北京百度网讯科技有限公司 基于大模型的代码生成方法、装置、电子设备及存储介质
KR102823763B1 (ko) * 2024-12-10 2025-06-23 한화시스템 주식회사 문장 구문 해석 기반 전투체계 데이터 생성 시스템 및 방법
CN120430300B (zh) * 2025-07-09 2025-09-23 中国民用航空飞行学院 一种航行通告文本自动纠错方法、系统、存储介质及终端
CN120541194B (zh) * 2025-07-25 2025-10-24 浪潮通用软件有限公司 基于多维标签的知识检索方法、系统及计算机设备
CN121303112A (zh) * 2025-09-28 2026-01-09 北京首发展智能科技有限公司 一种基于llm模型的标签获取方法、设备及介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10635727B2 (en) * 2016-08-16 2020-04-28 Ebay Inc. Semantic forward search indexing of publication corpus

Also Published As

Publication number Publication date
KR20240023535A (ko) 2024-02-22
AU2022304683A1 (en) 2024-01-04
JP2024524060A (ja) 2024-07-05
CA3225020A1 (en) 2023-01-05
WO2023278070A1 (en) 2023-01-05
BR112023027439A2 (pt) 2024-03-12
US20240370484A1 (en) 2024-11-07
EP4364000A1 (en) 2024-05-08

Similar Documents

Publication Publication Date Title
ZA202400308B (en) Automatic labeling of text data
CN107329967B (zh) 基于深度学习的问答系统以及方法
WO2013192218A3 (en) Dynamic language model
EP3879427A3 (en) Information extraction method, extraction model training method, apparatus and electronic device
EP4033484A3 (en) Recognition of semantic information of a speech signal, training a recognition model
EP3913542A3 (en) Method and apparatus of training model, device, medium, and program product
SG11201902848QA (en) Intention acquisition method, electronic device and computer-readable storage medium
MX2019004407A (es) Agentes conversacionales multi-propósito basados en tecnicas de aprendizaje profundo para el procesamiento de consultas de lenguaje natural.
WO2019190646A3 (en) Natural assistant interaction
EP3822842A3 (en) Method and apparatus for generating semantic representation model, electronic device, and storage medium
WO2020191282A3 (en) System and method for multi-task lifelong learning on personal device with improved user experience
WO2017134519A4 (en) Image classification and labeling
WO2019133856A3 (en) Automated discourse phrase discovery for generating an improved language model of a digital assistant
CN108717413B (zh) 一种基于假设性半监督学习的开放领域问答方法
WO2014117553A1 (en) Method and system of adding punctuation and establishing language model
GB2604276A (en) Rare topic detection using hierarchical clustering
MX2017007364A (es) Complejidad de localizacion de activos y recursos de lenguaje arbitrario.
MX2016013014A (es) Métodos y sistemas para la gestión de los diálogos de un robot.
EP3879451A3 (en) Image moderation method, image moderation apparatus, electronic device, and storage medium
CO2023009697A2 (es) Puntuación y uso de mayúsculas de transcripciones de reconocimiento del habla
Kakouros et al. 3PRO–An unsupervised method for the automatic detection of sentence prominence in speech
WO2011150415A3 (en) Methods and systems for automated creation, recognition and display of icons
WO2023287360A3 (zh) 多媒体处理方法、装置、电子设备及存储介质
CN106446022A (zh) 一种基于形式语义推理和深度学习的自然语言知识挖掘方法
GB2613743A8 (en) Systems and methods for skills inference using a datastore and models