WO2022239174A1 - 類似度導出システムおよび類似度導出方法 - Google Patents
類似度導出システムおよび類似度導出方法 Download PDFInfo
- Publication number
- WO2022239174A1 WO2022239174A1 PCT/JP2021/018169 JP2021018169W WO2022239174A1 WO 2022239174 A1 WO2022239174 A1 WO 2022239174A1 JP 2021018169 W JP2021018169 W JP 2021018169W WO 2022239174 A1 WO2022239174 A1 WO 2022239174A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hash
- hash value
- sets
- hash function
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3236—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
Definitions
- the minimum hash value may be called MinHash.
- the minimum hash value obtained for each element based on the same hash function is referred to as the minimum hash value.
- the hash value calculation unit 2, minimum hash value specification unit 3, and similarity derivation unit 4 are realized, for example, by a CPU (Central Processing Unit) of a computer that operates according to a similarity derivation program.
- the CPU reads a similarity derivation program from a program recording medium such as a program storage device of a computer, and according to the similarity derivation program, a hash value calculation unit 2, a minimum hash value identification unit 3, and a similarity derivation unit 4 should operate as
- FIG. 2 is a flowchart showing an example of the progress of processing in the first embodiment.
- description is abbreviate
- Steps S2 and S3 are the same as steps S2 and S3 in the first embodiment.
- the universal set generation unit 52 extracts only one element from a plurality of elements whose first hash values match and whose elements themselves match among the individual elements of each set. Also, each element that does not correspond to the plurality of elements is extracted, and one universal set including each extracted element is generated. Then, the second hash value calculation unit 54 calculates hash values corresponding to each hash function h2, . . . , hm other than the predetermined hash function h1 for each element belonging to the universal set. Therefore, for a plurality of elements whose first hash values match and whose elements themselves match, the second hash value calculator 54 calculates hash values corresponding to hash functions h2, . . . , hm. only once.
- the set selection unit 61 sequentially selects one set from a plurality of sets.
- the second hash value calculator 65 corresponds to each hash function h2, . is identical to the hash value corresponding to each hash function h2, . . . , hm of the matching element.
- step S77 the element selection unit 62 determines whether or not there is an unselected element in the selected set. In this example, it is determined that there is an unselected element in the selected set A (Yes in step S77). In this case, the processing after step S72 is repeated.
- the first hash value calculation unit 63 applies the numerical hash function h0 to the element selected in step S72, Converts the element to a numeric element.
- the determining unit 64 may determine whether or not matching elements have already been obtained, using elements that match the elements converted by the numerical hash function h0 as matching elements. If no matching element is obtained, in step S76, the second hash value calculator 65 applies a plurality of hash functions h1, h2, . Calculate each hash value corresponding to a plurality of hash functions h1, h2, . . . , hm.
- the hash value calculation process includes: a set selection process for sequentially selecting one set from the plurality of sets; Element selection processing for sequentially selecting one element from the selected set; a first hash value calculation process that calculates a first hash value of the selected element by applying the predetermined hash function to the selected element; a determination process for determining whether or not a matching element, which is an element whose first hash value matches the selected element and whose element itself matches, has already been selected; When the matching element has already been selected, the hash value corresponding to each hash function other than the predetermined hash function of the selected element is the same as the hash value corresponding to each hash function of the matching element. and and a second hash value calculation process of calculating a hash value corresponding to each hash function other than the predetermined hash function of the selected element when the matching element is not selected. Similarity derivation method.
- Hash obtained by a predetermined hash function out of the plurality of hash functions when obtaining a plurality of hash values obtained by applying a plurality of hash functions to individual elements of each set contained in the plurality of sets With respect to a plurality of elements having the same value and the same element itself, the duplication of calculation of each hash function other than the predetermined hash function is eliminated, and the plurality of hash values are obtained for each element of each set.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/018169 WO2022239174A1 (ja) | 2021-05-13 | 2021-05-13 | 類似度導出システムおよび類似度導出方法 |
| US18/288,586 US12413413B2 (en) | 2021-05-13 | 2021-05-13 | Similarity degree derivation system and similarity degree derivation method |
| JP2023520672A JP7464193B2 (ja) | 2021-05-13 | 2021-05-13 | 類似度導出システムおよび類似度導出方法 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/018169 WO2022239174A1 (ja) | 2021-05-13 | 2021-05-13 | 類似度導出システムおよび類似度導出方法 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022239174A1 true WO2022239174A1 (ja) | 2022-11-17 |
Family
ID=84028068
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/018169 Ceased WO2022239174A1 (ja) | 2021-05-13 | 2021-05-13 | 類似度導出システムおよび類似度導出方法 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12413413B2 (https=) |
| JP (1) | JP7464193B2 (https=) |
| WO (1) | WO2022239174A1 (https=) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170078286A1 (en) * | 2015-09-16 | 2017-03-16 | RiskIQ, Inc. | Using hash signatures of dom objects to identify website similarity |
| US20170161375A1 (en) * | 2015-12-07 | 2017-06-08 | Adlib Publishing Systems Inc. | Clustering documents based on textual content |
| US20170322930A1 (en) * | 2016-05-07 | 2017-11-09 | Jacob Michael Drew | Document based query and information retrieval systems and methods |
| US20180095941A1 (en) * | 2016-09-30 | 2018-04-05 | Quantum Metric, LLC | Techniques for view capture and storage for mobile applications |
| US20180181609A1 (en) * | 2016-12-28 | 2018-06-28 | Google Inc. | System for De-Duplicating Job Postings |
| WO2021038887A1 (ja) * | 2019-08-30 | 2021-03-04 | 富士通株式会社 | 類似文書検索方法、類似文書検索プログラム、類似文書検索装置、索引情報作成方法、索引情報作成プログラムおよび索引情報作成装置 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10579661B2 (en) * | 2013-05-20 | 2020-03-03 | Southern Methodist University | System and method for machine learning and classifying data |
| JP7032650B2 (ja) | 2018-06-28 | 2022-03-09 | 富士通株式会社 | 類似テキスト検索方法、類似テキスト検索装置および類似テキスト検索プログラム |
| US11120052B1 (en) * | 2018-06-28 | 2021-09-14 | Amazon Technologies, Inc. | Dynamic distributed data clustering using multi-level hash trees |
-
2021
- 2021-05-13 US US18/288,586 patent/US12413413B2/en active Active
- 2021-05-13 WO PCT/JP2021/018169 patent/WO2022239174A1/ja not_active Ceased
- 2021-05-13 JP JP2023520672A patent/JP7464193B2/ja active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170078286A1 (en) * | 2015-09-16 | 2017-03-16 | RiskIQ, Inc. | Using hash signatures of dom objects to identify website similarity |
| US20170161375A1 (en) * | 2015-12-07 | 2017-06-08 | Adlib Publishing Systems Inc. | Clustering documents based on textual content |
| US20170322930A1 (en) * | 2016-05-07 | 2017-11-09 | Jacob Michael Drew | Document based query and information retrieval systems and methods |
| US20180095941A1 (en) * | 2016-09-30 | 2018-04-05 | Quantum Metric, LLC | Techniques for view capture and storage for mobile applications |
| US20180181609A1 (en) * | 2016-12-28 | 2018-06-28 | Google Inc. | System for De-Duplicating Job Postings |
| WO2021038887A1 (ja) * | 2019-08-30 | 2021-03-04 | 富士通株式会社 | 類似文書検索方法、類似文書検索プログラム、類似文書検索装置、索引情報作成方法、索引情報作成プログラムおよび索引情報作成装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7464193B2 (ja) | 2024-04-09 |
| US20240214211A1 (en) | 2024-06-27 |
| US12413413B2 (en) | 2025-09-09 |
| JPWO2022239174A1 (https=) | 2022-11-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020086115A1 (en) | Multi-task training architecture and strategy for attention- based speech recognition system | |
| CN114023342A (zh) | 一种语音转换方法、装置、存储介质及电子设备 | |
| WO2017124930A1 (zh) | 一种特征数据处理方法及设备 | |
| JP6331756B2 (ja) | テストケース生成プログラム、テストケース生成方法、及びテストケース生成装置 | |
| CN119493559A (zh) | 代码生成方法、装置、电子设备以及存储介质 | |
| AU2024200306A1 (en) | Automated indexing and extraction of information in digital documents | |
| JPWO2020152804A1 (ja) | 情報提供システム、方法およびプログラム | |
| CN117762806A (zh) | 代码检测处理方法及装置 | |
| JP6778811B2 (ja) | 音声認識方法及び装置 | |
| WO2018180971A1 (ja) | 情報処理システム、特徴量説明方法および特徴量説明プログラム | |
| JP7464193B2 (ja) | 類似度導出システムおよび類似度導出方法 | |
| CN114884772B (zh) | 裸机vxlan的部署方法、系统和电子设备 | |
| JP7041603B2 (ja) | 計算機システム及び業務フローのパターンの生成方法 | |
| CN106648891A (zh) | 基于MapReduce模型的任务执行方法和装置 | |
| WO2022070422A1 (ja) | 計算機システム及び文字認識方法 | |
| WO2022049681A1 (ja) | 相関索引構築装置、相関テーブル探索装置、方法およびプログラム | |
| US12056147B2 (en) | Analysis device, analysis method, and analysis program | |
| CN117709302A (zh) | 一种文档转换方法及装置 | |
| WO2019171537A1 (ja) | 意味推定システム、方法およびプログラム | |
| CN114880242A (zh) | 测试用例的提取方法、装置、设备和介质 | |
| JP7184176B2 (ja) | 割当装置、方法およびプログラム | |
| JP2016184273A (ja) | 演算制御装置、演算制御方法及び演算制御プログラム | |
| CN107194014B (zh) | 数据源调用方法及装置 | |
| US20250217226A1 (en) | Root Cause Locating Method and Apparatus, and Storage Medium | |
| US20250298887A1 (en) | Program identification method and program identification device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21941908 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023520672 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18288586 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21941908 Country of ref document: EP Kind code of ref document: A1 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 18288586 Country of ref document: US |