JP2024521833A - オフセットマッピングを用いた単語分割アルゴリズム - Google Patents
オフセットマッピングを用いた単語分割アルゴリズム Download PDFInfo
- Publication number
- JP2024521833A JP2024521833A JP2023573274A JP2023573274A JP2024521833A JP 2024521833 A JP2024521833 A JP 2024521833A JP 2023573274 A JP2023573274 A JP 2023573274A JP 2023573274 A JP2023573274 A JP 2023573274A JP 2024521833 A JP2024521833 A JP 2024521833A
- Authority
- JP
- Japan
- Prior art keywords
- character
- index value
- string
- original string
- offset index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/163—Handling of whitespace
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN202141023933 | 2021-05-28 | ||
| IN202141023933 | 2021-05-28 | ||
| US17/444,347 | 2021-08-03 | ||
| US17/444,347 US11899698B2 (en) | 2021-05-28 | 2021-08-03 | Wordbreak algorithm with offset mapping |
| PCT/IB2022/000257 WO2022248933A1 (en) | 2021-05-28 | 2022-05-05 | Wordbreak algorithm with offset mapping |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JP2024521833A true JP2024521833A (ja) | 2024-06-04 |
| JP2024521833A5 JP2024521833A5 (https=) | 2025-05-15 |
Family
ID=82846495
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023573274A Pending JP2024521833A (ja) | 2021-05-28 | 2022-05-05 | オフセットマッピングを用いた単語分割アルゴリズム |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4348490A1 (https=) |
| JP (1) | JP2024521833A (https=) |
| KR (1) | KR20240011718A (https=) |
| WO (1) | WO2022248933A1 (https=) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240338186A1 (en) * | 2023-04-06 | 2024-10-10 | Oracle International Corporation | Compile-Time Checking For Exhaustive Switch Statements And Expressions |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5963893A (en) * | 1996-06-28 | 1999-10-05 | Microsoft Corporation | Identification of words in Japanese text by a computer system |
| US20200334381A1 (en) * | 2019-04-16 | 2020-10-22 | 3M Innovative Properties Company | Systems and methods for natural pseudonymization of text |
-
2022
- 2022-05-05 EP EP22751806.5A patent/EP4348490A1/en active Pending
- 2022-05-05 KR KR1020237040866A patent/KR20240011718A/ko active Pending
- 2022-05-05 JP JP2023573274A patent/JP2024521833A/ja active Pending
- 2022-05-05 WO PCT/IB2022/000257 patent/WO2022248933A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| EP4348490A1 (en) | 2024-04-10 |
| KR20240011718A (ko) | 2024-01-26 |
| WO2022248933A1 (en) | 2022-12-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10552462B1 (en) | Systems and methods for tokenizing user-annotated names | |
| US10867132B2 (en) | Ontology entity type detection from tokenized utterance | |
| US10970278B2 (en) | Querying knowledge graph with natural language input | |
| US10303689B2 (en) | Answering natural language table queries through semantic table representation | |
| US11620304B2 (en) | Example management for string transformation | |
| US10916237B2 (en) | Training utterance generation | |
| CN109804363B (zh) | 使用通过示例的格式修改的连接 | |
| US9971809B1 (en) | Systems and methods for searching unstructured documents for structured data | |
| US20140354533A1 (en) | Tagging using eye gaze detection | |
| US20130159346A1 (en) | Combinatorial document matching | |
| US10860666B2 (en) | Method and system for providing alternative result for an online search previously with no result | |
| WO2025239972A1 (en) | Automated workflow creation using large language models | |
| CN106569860A (zh) | 一种应用管理方法及终端 | |
| US20160179954A1 (en) | Systems and methods for culling search results in electronic discovery | |
| US11899698B2 (en) | Wordbreak algorithm with offset mapping | |
| JP2024521833A (ja) | オフセットマッピングを用いた単語分割アルゴリズム | |
| US9483535B1 (en) | Systems and methods for expanding search results | |
| US20240378915A1 (en) | Sequence labeling task extraction from inked content | |
| CN105354506B (zh) | 隐藏文件的方法和装置 | |
| US9286348B2 (en) | Dynamic search system | |
| JP6194180B2 (ja) | 文章マスク装置及び文章マスクプログラム | |
| US11132400B2 (en) | Data classification using probabilistic data structures | |
| CN117396878A (zh) | 带有偏移映射的分词算法 | |
| WO2019019711A1 (zh) | 行为模式数据的发布方法、装置、终端设备及介质 | |
| US9436743B1 (en) | Systems and methods for expanding search results |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20250501 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20250501 |
|
| RD03 | Notification of appointment of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7423 Effective date: 20250527 |
|
| RD04 | Notification of resignation of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7424 Effective date: 20250602 |