WO2021212614A1 - Procédé et appareil de correction d'erreur de texte, support de stockage lisible par ordinateur et système - Google Patents

Procédé et appareil de correction d'erreur de texte, support de stockage lisible par ordinateur et système Download PDF

Info

Publication number
WO2021212614A1
WO2021212614A1 PCT/CN2020/093561 CN2020093561W WO2021212614A1 WO 2021212614 A1 WO2021212614 A1 WO 2021212614A1 CN 2020093561 W CN2020093561 W CN 2020093561W WO 2021212614 A1 WO2021212614 A1 WO 2021212614A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
error correction
standard
character
Prior art date
Application number
PCT/CN2020/093561
Other languages
English (en)
Chinese (zh)
Inventor
谢静文
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021212614A1 publication Critical patent/WO2021212614A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a text error correction method, device, computer-readable storage medium and system.
  • the current text recognition method mostly uses OCR technology to read the text in the image and convert it into a character format that the computer can accept and people can understand.
  • OCR technology has high requirements on the quality of the input image, a large number of recognition errors are prone to occur in the case of low image accuracy, so it is necessary to perform error correction processing on the recognized characters.
  • the inventor realizes that the traditional method only performs error correction based on the characters in the image information, resulting in that the error correction result directly output by the OCR cannot meet the actual application requirements, and the accuracy rate is low. Therefore, how to achieve low-cost, high-precision text error correction is increasingly being valued.
  • This application provides a text error correction method, device, computer readable storage medium and system, the main purpose of which is to solve the problem of low text error correction accuracy and high cost.
  • a text error correction method provided by this application includes:
  • a text error correction device which includes:
  • the modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
  • the distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
  • the error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  • the present application also provides a computer-readable storage medium with a text error correction program stored on the computer-readable storage medium, and the text error correction program can be executed by one or more processors to achieve The following steps:
  • this application also provides a text error correction system, including:
  • the modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
  • the distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
  • the error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  • the embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared to the prior art only performing error correction based on the character itself in the image information, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value Compared with a preset standard error correction table, the output text obtained through image recognition technology is corrected to make the correction of errors more accurate. Therefore, the text error correction method, device, and computer-readable storage medium proposed in this application can realize a low-cost, high-precision text error correction solution.
  • FIG. 1 is a schematic flowchart of a text error correction method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of modules of a text error correction method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the internal structure of an electronic device of a text error correction method provided by an embodiment of the application;
  • This application provides a method for text error correction.
  • FIG. 1 it is a schematic flowchart of a text error correction method provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the text error correction method includes:
  • the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
  • the embodiment of the present application first performs the following preprocessing on the original text image:
  • the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image.
  • the amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
  • the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal.
  • the sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
  • the embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image, removes interference factors such as noise in the original text image, obtains the standard image, and ensures the accuracy of subsequent text error correction.
  • the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
  • NER Named Entity Recognition
  • the NER model adopts the Bi-LSTM-CRF structure, including:
  • Character/word vector layer used to convert words and characters in the text contained in the standard image into word/word vectors
  • Bi-LSTM layer divide the character/word vector, and divide the character /Word vector encoding to obtain the encoding representation of the character/word vector, and using the encoding representation to label the segmented word/word vector to obtain key values and result values;
  • CRF layer splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
  • the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector
  • the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
  • the Bi-LSTM layer can segment the character/word vector.
  • the Bi-LSTM layer can use java language to segment the character/word vector, and encode the segmented character/word vector, and the encoding representation includes Key-B, Value-B, Key-I, Value-I, Other-B, Other-I six types of labeling. Among them, Key is the key value, Value is the result value, and Other is the other value.
  • the CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
  • the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value.
  • the standard image contains the text "Pay 2.00 yuan (cash payment) )
  • the classification is conceited at 0.00 yuan.
  • this embodiment of the present application will use a standard error correction table to correct the above output text.
  • the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string.
  • the edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
  • the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
  • R is the key value of the output text
  • S is the key value of the standard error correction table
  • Pearson is the edit distance calculation.
  • the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance.
  • the correct text corresponding to the wrong text includes:
  • the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
  • the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
  • the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
  • the original text image can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • this solution can be tested in the fields of smart medical care in smart cities, so as to promote the construction of smart cities.
  • the embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared with the prior art that only performs character error correction based on the image information itself, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value and A preset standard error correction table is compared, so that a preset standard error correction table is used to correct the output text obtained through the image recognition technology, so that the error correction is more accurate. Therefore, the text error correction method, device and computer-readable storage medium proposed in this application can realize low-cost, high-precision text error correction.
  • FIG. 2 it is a functional block diagram of the text error correction device of the present application.
  • the text error correction device 100 described in this application can be installed in an electronic device.
  • the text error correction device may include an image acquisition module 101, an image segmentation module 102, a matching module 103, and an error correction module 104.
  • the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the modulation conversion module 101 is configured to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module 102 is configured to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values , Converting the standard image into output text according to the key value and the corresponding result value;
  • the distance calculation module 103 is configured to calculate the edit distance between the output text and a preset standard error correction table by using the key value, and obtain the error text and the error text in the output text according to the edit distance Corresponding correct text;
  • the error correction output module 104 is configured to replace the error text with the correct text to obtain standard output text.
  • each module of the text error correction device 100 is as follows:
  • the image acquisition module 101 acquires an original text image, and performs a preprocessing operation on the original text image to obtain a standard image.
  • the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
  • the embodiment of the present application first performs the following preprocessing on the original text image:
  • the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image.
  • the amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
  • the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal.
  • the sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
  • the embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image to remove interference factors such as noise in the original text image, obtain the standard image, and ensure the accuracy of subsequent text error correction.
  • the image segmentation module 102 uses a pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value are converted, and the standard image is converted into output text.
  • the text recognition model performs text recognition and segmentation processing on the standard image.
  • the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
  • NER Named Entity Recognition
  • the NER model adopts the Bi-LSTM-CRF structure, including:
  • Character/word vector layer used to convert words and characters in the text contained in the standard image to obtain a word/word vector
  • Bi-LSTM layer used to segment the character/word vector, encode the segmented character/word vector to obtain the encoding representation of the character/word vector, and use the encoding representation to /Word vector for labeling, get key value and result value;
  • CRF layer splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
  • the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector
  • the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
  • the Bi-LSTM layer may use java language to encode the word/word vector, and the encoding representation includes six types: Key-B, Value-B, Key-I, Value-I, Other-B, Other-I Label type. Among them, Key is the key value, Value is the result value, and Other is the other value.
  • the CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
  • the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value.
  • the standard image has the text "Pay 2.00 yuan (cash payment) )
  • the classification is conceited at 0.00 yuan.
  • the matching module 103 uses the key value to calculate the edit distance between the output text and the preset standard error correction table, and obtains the error text in the output text and the correct text corresponding to the error text according to the edit distance .
  • this embodiment of the present application will use a standard error correction table to correct the above output text.
  • the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string.
  • the edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
  • the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
  • R is the key value of the output text
  • S is the key value of the standard error correction table
  • Pearson is the edit distance calculation.
  • the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance.
  • the correct text corresponding to the wrong text includes:
  • the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
  • the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
  • the error correction module 104 replaces the error text with the correct text to obtain the standard output text.
  • the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
  • FIG. 3 it is a schematic diagram of the structure of an electronic device implementing the text error correction method of the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a text error correction program 12.
  • the memory 11 includes at least one type of readable storage medium, the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the text error correction program 12, etc., but also to temporarily store data that has been output or will be output.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing Text error correction programs, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the text error correction program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the pre-trained text recognition model uses the pre-trained text recognition model to perform text recognition and segmentation processing on the standard image, and generate key values and corresponding result values for the standard image after segmentation. According to the key values and corresponding result values, The standard image is converted into output text;
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)

Abstract

Procédé et appareil de correction d'erreur de texte, système et support de stockage lisible par ordinateur, se rapportant à la technologie de l'intelligence artificielle. Le procédé de correction d'erreur de texte consiste à : acquérir une image de texte d'origine et prétraiter l'image de texte d'origine pour obtenir une image standard (S1) ; effectuer une reconnaissance de texte sur l'image standard en utilisant un modèle de reconnaissance de texte pré-entraîné pour obtenir des vecteurs de caractères/mots, coder les vecteurs de caractères/mots pour générer des valeurs de clé et des valeurs de résultat correspondantes et convertir l'image standard en un texte de sortie en fonction des valeurs de clé et des valeurs de résultat correspondantes (S2) ; calculer une distance d'édition entre le texte de sortie et une table de correction d'erreur standard prédéfinie en utilisant les valeurs de clé et obtenir, en fonction de la distance d'édition, un texte erroné dans le texte de sortie et un texte correct correspondant au texte erroné (S3) ; et remplacer le texte erroné par le texte correct pour obtenir un texte de sortie standard (S4). Le procédé permet de remédier à la faible précision et au coût élevé de la correction d'erreur de texte. La présente invention concerne en outre une technologie de chaîne de blocs et est également applicable au domaine des villes intelligentes.
PCT/CN2020/093561 2020-04-23 2020-05-30 Procédé et appareil de correction d'erreur de texte, support de stockage lisible par ordinateur et système WO2021212614A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010326324.X 2020-04-23
CN202010326324.XA CN111626118B (zh) 2020-04-23 2020-04-23 文本纠错方法、装置、电子设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021212614A1 true WO2021212614A1 (fr) 2021-10-28

Family

ID=72258113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093561 WO2021212614A1 (fr) 2020-04-23 2020-05-30 Procédé et appareil de correction d'erreur de texte, support de stockage lisible par ordinateur et système

Country Status (2)

Country Link
CN (1) CN111626118B (fr)
WO (1) WO2021212614A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387603A (zh) * 2021-12-01 2022-04-22 科大讯飞股份有限公司 用于对汉字进行检纠错的方法、系统和计算设备
CN114550185A (zh) * 2022-04-19 2022-05-27 腾讯科技(深圳)有限公司 一种文档生成的方法、相关装置、设备以及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111660A (zh) * 2021-04-22 2021-07-13 脉景(杭州)健康管理有限公司 数据处理方法、装置、设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177291A1 (en) * 2011-01-07 2012-07-12 Yuval Gronau Document comparison and analysis
CN107491730A (zh) * 2017-07-14 2017-12-19 浙江大学 一种基于图像处理的化验单识别方法
CN107633250A (zh) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 一种文字识别纠错方法、纠错系统及计算机装置
CN107844481A (zh) * 2017-11-21 2018-03-27 新疆科大讯飞信息科技有限责任公司 识别文本检错方法及装置
CN110046350A (zh) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 文法错误识别方法、装置、计算机设备及存储介质
CN110782885A (zh) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 语音文本修正方法及装置、计算机设备和计算机存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8150161B2 (en) * 2008-09-22 2012-04-03 Intuit Inc. Technique for correcting character-recognition errors
CN110162767A (zh) * 2018-02-12 2019-08-23 北京京东尚科信息技术有限公司 文本纠错的方法和装置
CN109711412A (zh) * 2018-12-27 2019-05-03 信雅达系统工程股份有限公司 一种基于字典的光学字符识别纠错方法
CN110164435B (zh) * 2019-04-26 2024-06-25 平安科技(深圳)有限公司 语音识别方法、装置、设备及计算机可读存储介质
CN110619119B (zh) * 2019-07-23 2022-06-10 平安科技(深圳)有限公司 文本智能编辑方法、装置及计算机可读存储介质
CN110610180B (zh) * 2019-09-16 2024-08-20 腾讯科技(深圳)有限公司 错别字词识别集的生成方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177291A1 (en) * 2011-01-07 2012-07-12 Yuval Gronau Document comparison and analysis
CN107491730A (zh) * 2017-07-14 2017-12-19 浙江大学 一种基于图像处理的化验单识别方法
CN107633250A (zh) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 一种文字识别纠错方法、纠错系统及计算机装置
CN107844481A (zh) * 2017-11-21 2018-03-27 新疆科大讯飞信息科技有限责任公司 识别文本检错方法及装置
CN110046350A (zh) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 文法错误识别方法、装置、计算机设备及存储介质
CN110782885A (zh) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 语音文本修正方法及装置、计算机设备和计算机存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387603A (zh) * 2021-12-01 2022-04-22 科大讯飞股份有限公司 用于对汉字进行检纠错的方法、系统和计算设备
CN114550185A (zh) * 2022-04-19 2022-05-27 腾讯科技(深圳)有限公司 一种文档生成的方法、相关装置、设备以及存储介质
CN114550185B (zh) * 2022-04-19 2022-07-19 腾讯科技(深圳)有限公司 一种文档生成的方法、相关装置、设备以及存储介质

Also Published As

Publication number Publication date
CN111626118B (zh) 2024-06-28
CN111626118A (zh) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2021212614A1 (fr) Procédé et appareil de correction d'erreur de texte, support de stockage lisible par ordinateur et système
WO2022142593A1 (fr) Procédé et appareil de classification de texte, dispositif électronique et support de stockage lisible
WO2021135910A1 (fr) Procédé d'extraction d'informations basé sur la compréhension de lecture de machine et dispositif associé
WO2021189826A1 (fr) Procédé et appareil de génération de messages, dispositif électronique et support de stockage lisible par ordinateur
CN109522552B (zh) 一种医疗信息的归一化方法、装置、介质及电子设备
WO2021189829A1 (fr) Procédé et appareil de recherche par données descriptives, dispositif électronique et support d'information
CN111144210B (zh) 图像的结构化处理方法及装置、存储介质及电子设备
WO2021159762A1 (fr) Procédé et appareil pour l'extraction de relations de données, dispositif électronique et support d'enregistement
CN108921552B (zh) 一种验证证据的方法及装置
WO2022178994A1 (fr) Procédé et appareil de reconnaissance de structures tabulaires, dispositif électronique et support d'enregistrement
CN113205814B (zh) 语音数据标注方法、装置、电子设备及存储介质
WO2022194062A1 (fr) Procédé et appareil de détection de marqueur de maladie, dispositif électronique et support d'enregistrement
CN109784339A (zh) 图片识别测试方法、装置、计算机设备及存储介质
CN112464927B (zh) 一种信息提取方法、装置及系统
WO2021189903A1 (fr) Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations
CN113360768A (zh) 基于用户画像的产品推荐方法、装置、设备及存储介质
CN116912847A (zh) 一种医学文本识别方法、装置、计算机设备及存储介质
CN111985491A (zh) 基于深度学习的相似信息合并方法、装置、设备及介质
CN113254814A (zh) 网络课程视频打标签方法、装置、电子设备及介质
CN112633988A (zh) 用户产品推荐方法、装置、电子设备及可读存储介质
WO2022141867A1 (fr) Procédé et appareil de reconnaissance de parole, dispositif électronique et support de stockage lisible
WO2021151303A1 (fr) Dispositif et appareil d'alignement d'entités nommées, ainsi que dispositif électronique et support d'enregistrement lisible
CN115759040A (zh) 一种电子病历解析方法、装置、设备和存储介质
CN113434650B (zh) 问答对扩展方法、装置、电子设备及可读存储介质
WO2022156088A1 (fr) Procédé et appareil générateurs de signature d'empreinte digitale, dispositif électronique et support de stockage informatique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931922

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931922

Country of ref document: EP

Kind code of ref document: A1