SG10201904554TA - Methods and devices for quantifying text similarity - Google Patents
Methods and devices for quantifying text similarityInfo
- Publication number
- SG10201904554TA SG10201904554TA SG10201904554TA SG10201904554TA SG10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA
- Authority
- SG
- Singapore
- Prior art keywords
- edit
- similarity
- text
- text string
- quantifying
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
- G06V30/1983—Syntactic or structural pattern recognition, e.g. symbolic string recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24558—Binary matching operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/12—Detection or correction of errors, e.g. by rescanning the pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Character Discrimination (AREA)
Abstract
METHODS AND DEVICES FOR QUANTIFYING TEXT SIMILARITY The present disclosure provides methods and devices for quantifying text simi- larity. In an embodiment, there is provided a device for quantifying text similarity that comprises: a processor; and a memory including computer program code. The memory and the computer program code configured to, with the processor, cause the device to: obtain a plurality of shortest operation paths for correcting an optical correction recogni- tion (OCR) text string with an edit text string, wherein each of the plurality of shortest op- eration paths includes one or more edit pairs, each of the one or more edit pairs denot- ing an operation performable to a character of the OCR text string during correction by the edit text string; determine a plurality of similarity scores, each of the plurality of simi- larity scores corresponding to one of the plurality of shortest operation paths, wherein each of the plurality of similarity scores is determined by summing historical similarity scores of the one or more edit pairs of each of the plurality of shortest operation paths; and select a minimum one of the plurality of similarity scores to quantify text similarity between the OCR text string and the edit text string. (Figure 2)
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10201904554TA SG10201904554TA (en) | 2019-05-21 | 2019-05-21 | Methods and devices for quantifying text similarity |
MYPI2019007088A MY189246A (en) | 2019-05-21 | 2019-11-29 | Methods and devices for quantifying text similarity |
PH12019000463A PH12019000463B1 (en) | 2019-05-21 | 2019-12-05 | Methods and devices for quantifying text similarity |
US16/791,858 US10929710B2 (en) | 2019-05-21 | 2020-02-14 | Methods and devices for quantifying text similarity |
CN202010313564.6A CN111985519B (en) | 2019-05-21 | 2020-04-17 | Text similarity quantification method, equipment and system |
CN202110891281.4A CN113723466B (en) | 2019-05-21 | 2020-04-17 | Text similarity quantification method, device and system |
US17/181,839 US11210553B2 (en) | 2019-05-21 | 2021-02-22 | Methods and devices for quantifying text similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10201904554TA SG10201904554TA (en) | 2019-05-21 | 2019-05-21 | Methods and devices for quantifying text similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
SG10201904554TA true SG10201904554TA (en) | 2019-09-27 |
Family
ID=68062733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
SG10201904554TA SG10201904554TA (en) | 2019-05-21 | 2019-05-21 | Methods and devices for quantifying text similarity |
Country Status (5)
Country | Link |
---|---|
US (2) | US10929710B2 (en) |
CN (2) | CN111985519B (en) |
MY (1) | MY189246A (en) |
PH (1) | PH12019000463B1 (en) |
SG (1) | SG10201904554TA (en) |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040181527A1 (en) * | 2003-03-11 | 2004-09-16 | Lockheed Martin Corporation | Robust system for interactively learning a string similarity measurement |
US9727804B1 (en) * | 2005-04-15 | 2017-08-08 | Matrox Electronic Systems, Ltd. | Method of correcting strings |
US8150161B2 (en) | 2008-09-22 | 2012-04-03 | Intuit Inc. | Technique for correcting character-recognition errors |
CN102193993B (en) * | 2011-04-20 | 2013-09-04 | 北京百度网讯科技有限公司 | Method, device and facility for determining similarity information between character string information |
CN102722556B (en) * | 2012-05-29 | 2014-10-22 | 清华大学 | Model comparison method based on similarity measurement |
US8655075B2 (en) | 2012-07-05 | 2014-02-18 | Sureprep, Llc | Optical character recognition verification and correction system |
US9384423B2 (en) * | 2013-05-28 | 2016-07-05 | Xerox Corporation | System and method for OCR output verification |
CN103699233B (en) * | 2013-12-20 | 2019-04-09 | 百度在线网络技术(北京)有限公司 | Character string input method and input unit |
CN105183732A (en) * | 2014-06-04 | 2015-12-23 | 广州市动景计算机科技有限公司 | Method and device for processing webpage |
US9747273B2 (en) * | 2014-08-19 | 2017-08-29 | International Business Machines Corporation | String comparison results for character strings using frequency data |
CN106033416B (en) | 2015-03-09 | 2019-12-24 | 阿里巴巴集团控股有限公司 | Character string processing method and device |
US20170046668A1 (en) | 2015-08-16 | 2017-02-16 | Google Inc. | Comparing An Extracted User Name with Stored User Data |
CN105446957B (en) * | 2015-12-03 | 2018-07-20 | 小米科技有限责任公司 | Similitude determines method, apparatus and terminal |
CN105678244B (en) * | 2015-12-31 | 2018-12-18 | 北京理工大学 | A kind of near video search method based on improved edit-distance |
CN106997335B (en) * | 2016-01-26 | 2020-05-19 | 阿里巴巴集团控股有限公司 | Identical character string determination method and device |
CN107203567A (en) * | 2016-03-18 | 2017-09-26 | 伊姆西公司 | Method and apparatus for searching for word string |
CN106250364A (en) | 2016-07-20 | 2016-12-21 | 科大讯飞股份有限公司 | A kind of text modification method and device |
CN109863483A (en) * | 2016-08-09 | 2019-06-07 | 瑞普科德公司 | System and method for electronical record label |
US10643183B2 (en) * | 2016-10-18 | 2020-05-05 | Google Llc | Search engine |
US10757218B2 (en) * | 2017-03-29 | 2020-08-25 | Alibaba Group Holding Limited | Method and apparatus for generating push notifications |
CN107220639A (en) * | 2017-04-14 | 2017-09-29 | 北京捷通华声科技股份有限公司 | The correcting method and device of OCR recognition results |
WO2019009916A1 (en) * | 2017-07-07 | 2019-01-10 | Hewlett-Packard Development Company, L.P. | Image alignments via optical character recognition |
CN109388252B (en) * | 2017-08-14 | 2022-10-04 | 北京搜狗科技发展有限公司 | Input method and device |
CN107909054B (en) * | 2017-11-30 | 2021-05-04 | 任艳 | Similarity evaluation method and device for picture texts |
CN108256587A (en) * | 2018-02-05 | 2018-07-06 | 武汉斗鱼网络科技有限公司 | Determining method, apparatus, computer and the storage medium of a kind of similarity of character string |
CN109710929A (en) * | 2018-12-18 | 2019-05-03 | 金蝶软件(中国)有限公司 | A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text |
-
2019
- 2019-05-21 SG SG10201904554TA patent/SG10201904554TA/en unknown
- 2019-11-29 MY MYPI2019007088A patent/MY189246A/en unknown
- 2019-12-05 PH PH12019000463A patent/PH12019000463B1/en unknown
-
2020
- 2020-02-14 US US16/791,858 patent/US10929710B2/en active Active
- 2020-04-17 CN CN202010313564.6A patent/CN111985519B/en active Active
- 2020-04-17 CN CN202110891281.4A patent/CN113723466B/en active Active
-
2021
- 2021-02-22 US US17/181,839 patent/US11210553B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US11210553B2 (en) | 2021-12-28 |
CN113723466B (en) | 2024-03-08 |
CN111985519B (en) | 2021-07-27 |
US20200372293A1 (en) | 2020-11-26 |
US10929710B2 (en) | 2021-02-23 |
US20210174136A1 (en) | 2021-06-10 |
MY189246A (en) | 2022-01-31 |
CN113723466A (en) | 2021-11-30 |
PH12019000463A1 (en) | 2020-12-02 |
CN111985519A (en) | 2020-11-24 |
PH12019000463B1 (en) | 2020-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017250105A1 (en) | Performance model adverse impact correction | |
PH12019501153A1 (en) | System and method for implementing native contract on blockchain | |
MY190598A (en) | Blockchain data processing method and apparatus | |
CO2017007032A2 (en) | Updating language understanding classifier models for a personal digital assistant based on mass outsourcing | |
CN109766538B (en) | Text error correction method and device, electronic equipment and storage medium | |
KR20190020119A (en) | Error correction methods and devices for search terms | |
US20220198137A1 (en) | Text error-correcting method, apparatus, electronic device and readable storage medium | |
MX2020008381A (en) | Financial regulatory compliance platform. | |
CN111310443A (en) | Text error correction method and system | |
MX2021011617A (en) | Adaptive error correction in quantum computing. | |
US20140172774A1 (en) | Method and device for named-entity recognition | |
EP2206058A4 (en) | Systems and methods for character correction in communication devices | |
CN106708799A (en) | Text error correction method and device, and terminal | |
KR20070060862A (en) | Apparatus and method for learning data construction | |
EP3629182A3 (en) | Generating a test script execution order | |
KR102033151B1 (en) | Data merging device and method for bia datda analysis | |
CN106484132B (en) | Input error correction method and input method device | |
CN103389915A (en) | Input error correcting method, input error correcting device, input error correcting server and input error correcting server system | |
WO2019101234A3 (en) | Methods and devices for performing off-chain testing on smart contract | |
RU2014101663A (en) | METHOD FOR IDENTIFYING THE NEED TO TRAIN THE STANDARD IN VERIFICATION OF A RECOGNIZED TEXT | |
CN109271630A (en) | A kind of intelligent dimension method and device based on natural language processing | |
EP3923177A1 (en) | Method and apparatus for correcting character errors, electronic device and stroage medium | |
CN107038441A (en) | Clipboard is detected and corrected | |
SG10201904554TA (en) | Methods and devices for quantifying text similarity | |
RU2016147118A (en) | INTEGRATED ENVIRONMENT FOR EVALUATING THE SENSITIVITY OF INDICATORS OF EFFICIENCY TO EXTERNAL FACTORS AND OPERATIONAL DECISIONS FOR THE OFFER GENERATED BY THE COMPUTER, OPTIMUM PLANS OF ACTIVITY |