SG10201904554TA - Methods and devices for quantifying text similarity - Google Patents

Methods and devices for quantifying text similarity

Info

Publication number
SG10201904554TA
SG10201904554TA SG10201904554TA SG10201904554TA SG10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA SG 10201904554T A SG10201904554T A SG 10201904554TA
Authority
SG
Singapore
Prior art keywords
edit
similarity
text
text string
quantifying
Prior art date
Application number
SG10201904554TA
Inventor
Ruoyu Li
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to SG10201904554TA priority Critical patent/SG10201904554TA/en
Publication of SG10201904554TA publication Critical patent/SG10201904554TA/en
Priority to MYPI2019007088A priority patent/MY189246A/en
Priority to PH12019000463A priority patent/PH12019000463B1/en
Priority to US16/791,858 priority patent/US10929710B2/en
Priority to CN202010313564.6A priority patent/CN111985519B/en
Priority to CN202110891281.4A priority patent/CN113723466B/en
Priority to US17/181,839 priority patent/US11210553B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/196Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
    • G06V30/1983Syntactic or structural pattern recognition, e.g. symbolic string recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)

Abstract

METHODS AND DEVICES FOR QUANTIFYING TEXT SIMILARITY The present disclosure provides methods and devices for quantifying text simi- larity. In an embodiment, there is provided a device for quantifying text similarity that comprises: a processor; and a memory including computer program code. The memory and the computer program code configured to, with the processor, cause the device to: obtain a plurality of shortest operation paths for correcting an optical correction recogni- tion (OCR) text string with an edit text string, wherein each of the plurality of shortest op- eration paths includes one or more edit pairs, each of the one or more edit pairs denot- ing an operation performable to a character of the OCR text string during correction by the edit text string; determine a plurality of similarity scores, each of the plurality of simi- larity scores corresponding to one of the plurality of shortest operation paths, wherein each of the plurality of similarity scores is determined by summing historical similarity scores of the one or more edit pairs of each of the plurality of shortest operation paths; and select a minimum one of the plurality of similarity scores to quantify text similarity between the OCR text string and the edit text string. (Figure 2)
SG10201904554TA 2019-05-21 2019-05-21 Methods and devices for quantifying text similarity SG10201904554TA (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
SG10201904554TA SG10201904554TA (en) 2019-05-21 2019-05-21 Methods and devices for quantifying text similarity
MYPI2019007088A MY189246A (en) 2019-05-21 2019-11-29 Methods and devices for quantifying text similarity
PH12019000463A PH12019000463B1 (en) 2019-05-21 2019-12-05 Methods and devices for quantifying text similarity
US16/791,858 US10929710B2 (en) 2019-05-21 2020-02-14 Methods and devices for quantifying text similarity
CN202010313564.6A CN111985519B (en) 2019-05-21 2020-04-17 Text similarity quantification method, equipment and system
CN202110891281.4A CN113723466B (en) 2019-05-21 2020-04-17 Text similarity quantification method, device and system
US17/181,839 US11210553B2 (en) 2019-05-21 2021-02-22 Methods and devices for quantifying text similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SG10201904554TA SG10201904554TA (en) 2019-05-21 2019-05-21 Methods and devices for quantifying text similarity

Publications (1)

Publication Number Publication Date
SG10201904554TA true SG10201904554TA (en) 2019-09-27

Family

ID=68062733

Family Applications (1)

Application Number Title Priority Date Filing Date
SG10201904554TA SG10201904554TA (en) 2019-05-21 2019-05-21 Methods and devices for quantifying text similarity

Country Status (5)

Country Link
US (2) US10929710B2 (en)
CN (2) CN111985519B (en)
MY (1) MY189246A (en)
PH (1) PH12019000463B1 (en)
SG (1) SG10201904554TA (en)

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181527A1 (en) * 2003-03-11 2004-09-16 Lockheed Martin Corporation Robust system for interactively learning a string similarity measurement
US9727804B1 (en) * 2005-04-15 2017-08-08 Matrox Electronic Systems, Ltd. Method of correcting strings
US8150161B2 (en) 2008-09-22 2012-04-03 Intuit Inc. Technique for correcting character-recognition errors
CN102193993B (en) * 2011-04-20 2013-09-04 北京百度网讯科技有限公司 Method, device and facility for determining similarity information between character string information
CN102722556B (en) * 2012-05-29 2014-10-22 清华大学 Model comparison method based on similarity measurement
US8655075B2 (en) 2012-07-05 2014-02-18 Sureprep, Llc Optical character recognition verification and correction system
US9384423B2 (en) * 2013-05-28 2016-07-05 Xerox Corporation System and method for OCR output verification
CN103699233B (en) * 2013-12-20 2019-04-09 百度在线网络技术(北京)有限公司 Character string input method and input unit
CN105183732A (en) * 2014-06-04 2015-12-23 广州市动景计算机科技有限公司 Method and device for processing webpage
US9747273B2 (en) * 2014-08-19 2017-08-29 International Business Machines Corporation String comparison results for character strings using frequency data
CN106033416B (en) 2015-03-09 2019-12-24 阿里巴巴集团控股有限公司 Character string processing method and device
US20170046668A1 (en) 2015-08-16 2017-02-16 Google Inc. Comparing An Extracted User Name with Stored User Data
CN105446957B (en) * 2015-12-03 2018-07-20 小米科技有限责任公司 Similitude determines method, apparatus and terminal
CN105678244B (en) * 2015-12-31 2018-12-18 北京理工大学 A kind of near video search method based on improved edit-distance
CN106997335B (en) * 2016-01-26 2020-05-19 阿里巴巴集团控股有限公司 Identical character string determination method and device
CN107203567A (en) * 2016-03-18 2017-09-26 伊姆西公司 Method and apparatus for searching for word string
CN106250364A (en) 2016-07-20 2016-12-21 科大讯飞股份有限公司 A kind of text modification method and device
CN109863483A (en) * 2016-08-09 2019-06-07 瑞普科德公司 System and method for electronical record label
US10643183B2 (en) * 2016-10-18 2020-05-05 Google Llc Search engine
US10757218B2 (en) * 2017-03-29 2020-08-25 Alibaba Group Holding Limited Method and apparatus for generating push notifications
CN107220639A (en) * 2017-04-14 2017-09-29 北京捷通华声科技股份有限公司 The correcting method and device of OCR recognition results
WO2019009916A1 (en) * 2017-07-07 2019-01-10 Hewlett-Packard Development Company, L.P. Image alignments via optical character recognition
CN109388252B (en) * 2017-08-14 2022-10-04 北京搜狗科技发展有限公司 Input method and device
CN107909054B (en) * 2017-11-30 2021-05-04 任艳 Similarity evaluation method and device for picture texts
CN108256587A (en) * 2018-02-05 2018-07-06 武汉斗鱼网络科技有限公司 Determining method, apparatus, computer and the storage medium of a kind of similarity of character string
CN109710929A (en) * 2018-12-18 2019-05-03 金蝶软件(中国)有限公司 A kind of bearing calibration, device, computer equipment and the storage medium of speech recognition text

Also Published As

Publication number Publication date
US11210553B2 (en) 2021-12-28
CN113723466B (en) 2024-03-08
CN111985519B (en) 2021-07-27
US20200372293A1 (en) 2020-11-26
US10929710B2 (en) 2021-02-23
US20210174136A1 (en) 2021-06-10
MY189246A (en) 2022-01-31
CN113723466A (en) 2021-11-30
PH12019000463A1 (en) 2020-12-02
CN111985519A (en) 2020-11-24
PH12019000463B1 (en) 2020-12-02

Similar Documents

Publication Publication Date Title
AU2017250105A1 (en) Performance model adverse impact correction
PH12019501153A1 (en) System and method for implementing native contract on blockchain
MY190598A (en) Blockchain data processing method and apparatus
CO2017007032A2 (en) Updating language understanding classifier models for a personal digital assistant based on mass outsourcing
CN109766538B (en) Text error correction method and device, electronic equipment and storage medium
KR20190020119A (en) Error correction methods and devices for search terms
US20220198137A1 (en) Text error-correcting method, apparatus, electronic device and readable storage medium
MX2020008381A (en) Financial regulatory compliance platform.
CN111310443A (en) Text error correction method and system
MX2021011617A (en) Adaptive error correction in quantum computing.
US20140172774A1 (en) Method and device for named-entity recognition
EP2206058A4 (en) Systems and methods for character correction in communication devices
CN106708799A (en) Text error correction method and device, and terminal
KR20070060862A (en) Apparatus and method for learning data construction
EP3629182A3 (en) Generating a test script execution order
KR102033151B1 (en) Data merging device and method for bia datda analysis
CN106484132B (en) Input error correction method and input method device
CN103389915A (en) Input error correcting method, input error correcting device, input error correcting server and input error correcting server system
WO2019101234A3 (en) Methods and devices for performing off-chain testing on smart contract
RU2014101663A (en) METHOD FOR IDENTIFYING THE NEED TO TRAIN THE STANDARD IN VERIFICATION OF A RECOGNIZED TEXT
CN109271630A (en) A kind of intelligent dimension method and device based on natural language processing
EP3923177A1 (en) Method and apparatus for correcting character errors, electronic device and stroage medium
CN107038441A (en) Clipboard is detected and corrected
SG10201904554TA (en) Methods and devices for quantifying text similarity
RU2016147118A (en) INTEGRATED ENVIRONMENT FOR EVALUATING THE SENSITIVITY OF INDICATORS OF EFFICIENCY TO EXTERNAL FACTORS AND OPERATIONAL DECISIONS FOR THE OFFER GENERATED BY THE COMPUTER, OPTIMUM PLANS OF ACTIVITY