JP2024521833A - オフセットマッピングを用いた単語分割アルゴリズム - Google Patents

オフセットマッピングを用いた単語分割アルゴリズム Download PDF

Info

Publication number
JP2024521833A
JP2024521833A JP2023573274A JP2023573274A JP2024521833A JP 2024521833 A JP2024521833 A JP 2024521833A JP 2023573274 A JP2023573274 A JP 2023573274A JP 2023573274 A JP2023573274 A JP 2023573274A JP 2024521833 A JP2024521833 A JP 2024521833A
Authority
JP
Japan
Prior art keywords
character
index value
string
original string
offset index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2023573274A
Other languages
English (en)
Japanese (ja)
Other versions
JP2024521833A5 (https=
Inventor
グプタ,マノージュ
モトラーニ,カビン
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/444,347 external-priority patent/US11899698B2/en
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Publication of JP2024521833A publication Critical patent/JP2024521833A/ja
Publication of JP2024521833A5 publication Critical patent/JP2024521833A5/ja
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/163Handling of whitespace
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
JP2023573274A 2021-05-28 2022-05-05 オフセットマッピングを用いた単語分割アルゴリズム Pending JP2024521833A (ja)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
IN202141023933 2021-05-28
IN202141023933 2021-05-28
US17/444,347 2021-08-03
US17/444,347 US11899698B2 (en) 2021-05-28 2021-08-03 Wordbreak algorithm with offset mapping
PCT/IB2022/000257 WO2022248933A1 (en) 2021-05-28 2022-05-05 Wordbreak algorithm with offset mapping

Publications (2)

Publication Number Publication Date
JP2024521833A true JP2024521833A (ja) 2024-06-04
JP2024521833A5 JP2024521833A5 (https=) 2025-05-15

Family

ID=82846495

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023573274A Pending JP2024521833A (ja) 2021-05-28 2022-05-05 オフセットマッピングを用いた単語分割アルゴリズム

Country Status (4)

Country Link
EP (1) EP4348490A1 (https=)
JP (1) JP2024521833A (https=)
KR (1) KR20240011718A (https=)
WO (1) WO2022248933A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240338186A1 (en) * 2023-04-06 2024-10-10 Oracle International Corporation Compile-Time Checking For Exhaustive Switch Statements And Expressions

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963893A (en) * 1996-06-28 1999-10-05 Microsoft Corporation Identification of words in Japanese text by a computer system
US20200334381A1 (en) * 2019-04-16 2020-10-22 3M Innovative Properties Company Systems and methods for natural pseudonymization of text

Also Published As

Publication number Publication date
EP4348490A1 (en) 2024-04-10
KR20240011718A (ko) 2024-01-26
WO2022248933A1 (en) 2022-12-01

Similar Documents

Publication Publication Date Title
US10552462B1 (en) Systems and methods for tokenizing user-annotated names
US10867132B2 (en) Ontology entity type detection from tokenized utterance
US10970278B2 (en) Querying knowledge graph with natural language input
US10303689B2 (en) Answering natural language table queries through semantic table representation
US11620304B2 (en) Example management for string transformation
US10916237B2 (en) Training utterance generation
CN109804363B (zh) 使用通过示例的格式修改的连接
US9971809B1 (en) Systems and methods for searching unstructured documents for structured data
US20140354533A1 (en) Tagging using eye gaze detection
US20130159346A1 (en) Combinatorial document matching
US10860666B2 (en) Method and system for providing alternative result for an online search previously with no result
WO2025239972A1 (en) Automated workflow creation using large language models
CN106569860A (zh) 一种应用管理方法及终端
US20160179954A1 (en) Systems and methods for culling search results in electronic discovery
US11899698B2 (en) Wordbreak algorithm with offset mapping
JP2024521833A (ja) オフセットマッピングを用いた単語分割アルゴリズム
US9483535B1 (en) Systems and methods for expanding search results
US20240378915A1 (en) Sequence labeling task extraction from inked content
CN105354506B (zh) 隐藏文件的方法和装置
US9286348B2 (en) Dynamic search system
JP6194180B2 (ja) 文章マスク装置及び文章マスクプログラム
US11132400B2 (en) Data classification using probabilistic data structures
CN117396878A (zh) 带有偏移映射的分词算法
WO2019019711A1 (zh) 行为模式数据的发布方法、装置、终端设备及介质
US9436743B1 (en) Systems and methods for expanding search results

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250501

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20250501

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20250527

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20250602