WO2021212614A1 - Text error correction method and apparatus, computer-readable storage medium and system - Google Patents

Text error correction method and apparatus, computer-readable storage medium and system Download PDF

Info

Publication number
WO2021212614A1
WO2021212614A1 PCT/CN2020/093561 CN2020093561W WO2021212614A1 WO 2021212614 A1 WO2021212614 A1 WO 2021212614A1 CN 2020093561 W CN2020093561 W CN 2020093561W WO 2021212614 A1 WO2021212614 A1 WO 2021212614A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
error correction
standard
character
Prior art date
Application number
PCT/CN2020/093561
Other languages
French (fr)
Chinese (zh)
Inventor
谢静文
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010326324.XA external-priority patent/CN111626118B/en
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021212614A1 publication Critical patent/WO2021212614A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a text error correction method, device, computer-readable storage medium and system.
  • the current text recognition method mostly uses OCR technology to read the text in the image and convert it into a character format that the computer can accept and people can understand.
  • OCR technology has high requirements on the quality of the input image, a large number of recognition errors are prone to occur in the case of low image accuracy, so it is necessary to perform error correction processing on the recognized characters.
  • the inventor realizes that the traditional method only performs error correction based on the characters in the image information, resulting in that the error correction result directly output by the OCR cannot meet the actual application requirements, and the accuracy rate is low. Therefore, how to achieve low-cost, high-precision text error correction is increasingly being valued.
  • This application provides a text error correction method, device, computer readable storage medium and system, the main purpose of which is to solve the problem of low text error correction accuracy and high cost.
  • a text error correction method provided by this application includes:
  • a text error correction device which includes:
  • the modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
  • the distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
  • the error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  • the present application also provides a computer-readable storage medium with a text error correction program stored on the computer-readable storage medium, and the text error correction program can be executed by one or more processors to achieve The following steps:
  • this application also provides a text error correction system, including:
  • the modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
  • the distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
  • the error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  • the embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared to the prior art only performing error correction based on the character itself in the image information, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value Compared with a preset standard error correction table, the output text obtained through image recognition technology is corrected to make the correction of errors more accurate. Therefore, the text error correction method, device, and computer-readable storage medium proposed in this application can realize a low-cost, high-precision text error correction solution.
  • FIG. 1 is a schematic flowchart of a text error correction method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of modules of a text error correction method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of the internal structure of an electronic device of a text error correction method provided by an embodiment of the application;
  • This application provides a method for text error correction.
  • FIG. 1 it is a schematic flowchart of a text error correction method provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the text error correction method includes:
  • the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
  • the embodiment of the present application first performs the following preprocessing on the original text image:
  • the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image.
  • the amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
  • the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal.
  • the sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
  • the embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image, removes interference factors such as noise in the original text image, obtains the standard image, and ensures the accuracy of subsequent text error correction.
  • the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
  • NER Named Entity Recognition
  • the NER model adopts the Bi-LSTM-CRF structure, including:
  • Character/word vector layer used to convert words and characters in the text contained in the standard image into word/word vectors
  • Bi-LSTM layer divide the character/word vector, and divide the character /Word vector encoding to obtain the encoding representation of the character/word vector, and using the encoding representation to label the segmented word/word vector to obtain key values and result values;
  • CRF layer splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
  • the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector
  • the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
  • the Bi-LSTM layer can segment the character/word vector.
  • the Bi-LSTM layer can use java language to segment the character/word vector, and encode the segmented character/word vector, and the encoding representation includes Key-B, Value-B, Key-I, Value-I, Other-B, Other-I six types of labeling. Among them, Key is the key value, Value is the result value, and Other is the other value.
  • the CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
  • the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value.
  • the standard image contains the text "Pay 2.00 yuan (cash payment) )
  • the classification is conceited at 0.00 yuan.
  • this embodiment of the present application will use a standard error correction table to correct the above output text.
  • the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string.
  • the edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
  • the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
  • R is the key value of the output text
  • S is the key value of the standard error correction table
  • Pearson is the edit distance calculation.
  • the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance.
  • the correct text corresponding to the wrong text includes:
  • the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
  • the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
  • the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
  • the original text image can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • this solution can be tested in the fields of smart medical care in smart cities, so as to promote the construction of smart cities.
  • the embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared with the prior art that only performs character error correction based on the image information itself, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value and A preset standard error correction table is compared, so that a preset standard error correction table is used to correct the output text obtained through the image recognition technology, so that the error correction is more accurate. Therefore, the text error correction method, device and computer-readable storage medium proposed in this application can realize low-cost, high-precision text error correction.
  • FIG. 2 it is a functional block diagram of the text error correction device of the present application.
  • the text error correction device 100 described in this application can be installed in an electronic device.
  • the text error correction device may include an image acquisition module 101, an image segmentation module 102, a matching module 103, and an error correction module 104.
  • the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the modulation conversion module 101 is configured to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
  • the text segmentation module 102 is configured to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values , Converting the standard image into output text according to the key value and the corresponding result value;
  • the distance calculation module 103 is configured to calculate the edit distance between the output text and a preset standard error correction table by using the key value, and obtain the error text and the error text in the output text according to the edit distance Corresponding correct text;
  • the error correction output module 104 is configured to replace the error text with the correct text to obtain standard output text.
  • each module of the text error correction device 100 is as follows:
  • the image acquisition module 101 acquires an original text image, and performs a preprocessing operation on the original text image to obtain a standard image.
  • the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
  • the embodiment of the present application first performs the following preprocessing on the original text image:
  • the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image.
  • the amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
  • the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal.
  • the sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
  • the embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image to remove interference factors such as noise in the original text image, obtain the standard image, and ensure the accuracy of subsequent text error correction.
  • the image segmentation module 102 uses a pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value are converted, and the standard image is converted into output text.
  • the text recognition model performs text recognition and segmentation processing on the standard image.
  • the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
  • NER Named Entity Recognition
  • the NER model adopts the Bi-LSTM-CRF structure, including:
  • Character/word vector layer used to convert words and characters in the text contained in the standard image to obtain a word/word vector
  • Bi-LSTM layer used to segment the character/word vector, encode the segmented character/word vector to obtain the encoding representation of the character/word vector, and use the encoding representation to /Word vector for labeling, get key value and result value;
  • CRF layer splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
  • the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector
  • the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
  • the Bi-LSTM layer may use java language to encode the word/word vector, and the encoding representation includes six types: Key-B, Value-B, Key-I, Value-I, Other-B, Other-I Label type. Among them, Key is the key value, Value is the result value, and Other is the other value.
  • the CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
  • the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value.
  • the standard image has the text "Pay 2.00 yuan (cash payment) )
  • the classification is conceited at 0.00 yuan.
  • the matching module 103 uses the key value to calculate the edit distance between the output text and the preset standard error correction table, and obtains the error text in the output text and the correct text corresponding to the error text according to the edit distance .
  • this embodiment of the present application will use a standard error correction table to correct the above output text.
  • the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string.
  • the edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
  • the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
  • R is the key value of the output text
  • S is the key value of the standard error correction table
  • Pearson is the edit distance calculation.
  • the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance.
  • the correct text corresponding to the wrong text includes:
  • the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
  • the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
  • the error correction module 104 replaces the error text with the correct text to obtain the standard output text.
  • the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
  • FIG. 3 it is a schematic diagram of the structure of an electronic device implementing the text error correction method of the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a text error correction program 12.
  • the memory 11 includes at least one type of readable storage medium, the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the text error correction program 12, etc., but also to temporarily store data that has been output or will be output.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing Text error correction programs, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the text error correction program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the pre-trained text recognition model uses the pre-trained text recognition model to perform text recognition and segmentation processing on the standard image, and generate key values and corresponding result values for the standard image after segmentation. According to the key values and corresponding result values, The standard image is converted into output text;
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Discrimination (AREA)

Abstract

A text error correction method and apparatus, a system and a computer-readable storage medium, relating to the technology of artificial intelligence. The text error correction method comprises: acquiring an original text image, and preprocessing the original text image to obtain a standard image (S1); performing text recognition on the standard image by using a pre-trained text recognition model to obtain character/word vectors, encoding the character/word vectors to generate key values and corresponding result values, and converting the standard image into an output text according to the key values and the corresponding result values (S2); calculating an editing distance between the output text and a preset standard error correction table by using the key values, and obtaining, according to the editing distance, an error text in the output text and a correct text corresponding to the error text (S3); and replacing the error text with the correct text to obtain a standard output text (S4). The method can solve the problems of low precision and high cost of text error correction. The present invention further relates to a blockchain technology and is also applicable to the field of smart cities.

Description

文本纠错方法、装置、计算机可读存储介质及系统Text error correction method, device, computer readable storage medium and system
本申请要求于2020年04月23日提交中国专利局、申请号为202010326324.X、发明名称为“文本纠错方法、装置、计算机可读存储介质及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010326324.X, and the invention title is "text error correction method, device, computer-readable storage medium and system" on April 23, 2020. The entire content is incorporated into this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种文本纠错方法、装置、计算机可读存储介质及系统。This application relates to the field of artificial intelligence technology, and in particular to a text error correction method, device, computer-readable storage medium and system.
背景技术Background technique
目前文本识别的方法多为利用OCR技术把图像中的文字读取出来,并转换成一种计算机能够接受、人可以理解的字符格式。但是由于OCR技术对输入图像的质量要求很高,在图像精度较低的情况下容易出现大量的识别错误的情况,因此需要对识别出的字符进行纠错处理。发明人意识到传统方法只根据图像信息中的字符进行纠错,导致OCR直接输出的纠错结果无法满足实际应用要求,准确率较低。因此如何实现低成本,高精度的文本纠错越来越被人们所重视。The current text recognition method mostly uses OCR technology to read the text in the image and convert it into a character format that the computer can accept and people can understand. However, because the OCR technology has high requirements on the quality of the input image, a large number of recognition errors are prone to occur in the case of low image accuracy, so it is necessary to perform error correction processing on the recognized characters. The inventor realizes that the traditional method only performs error correction based on the characters in the image information, resulting in that the error correction result directly output by the OCR cannot meet the actual application requirements, and the accuracy rate is low. Therefore, how to achieve low-cost, high-precision text error correction is increasingly being valued.
发明内容Summary of the invention
本申请提供一种文本纠错方法、装置、计算机可读存储介质及系统,其主要目的在于解决文本纠错精度低,成本高的问题。This application provides a text error correction method, device, computer readable storage medium and system, the main purpose of which is to solve the problem of low text error correction accuracy and high cost.
为实现上述目的,本申请提供的一种文本纠错方法,包括:In order to achieve the above objective, a text error correction method provided by this application includes:
获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;Acquiring an original text image, and performing a preprocessing operation on the original text image to obtain a standard image;
利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;Use the pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to the key values and corresponding Result value, converting the standard image into output text;
利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance;
利用所述正确文本替换所述错误文本,得到标准输出文本。Replace the error text with the correct text to obtain the standard output text.
为了解决上述问题,本申请还提供一种文本纠错装置,所述装置包括:In order to solve the above-mentioned problems, the present application also provides a text error correction device, which includes:
调制转化模块,用于获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;The modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
文本分割模块,用于利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;The text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
距离计算模块,用于利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;The distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
纠错输出模块,用于利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction output module is used to replace the error text with the correct text to obtain the standard output text.
为了解决上述问题,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有文本纠错程序,所述文本纠错程序可被一个或者多个处理器执行,以实现如下步骤:In order to solve the above-mentioned problems, the present application also provides a computer-readable storage medium with a text error correction program stored on the computer-readable storage medium, and the text error correction program can be executed by one or more processors to achieve The following steps:
获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;Acquiring an original text image, and performing a preprocessing operation on the original text image to obtain a standard image;
利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;Use the pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to the key values and corresponding Result value, converting the standard image into output text;
利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance;
利用所述正确文本替换所述错误文本,得到标准输出文本。Replace the error text with the correct text to obtain the standard output text.
为了解决上述问题,本申请还提供一种文本纠错系统,包括:In order to solve the above problems, this application also provides a text error correction system, including:
调制转化模块,用于获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;The modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
文本分割模块,用于利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;The text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
距离计算模块,用于利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;The distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
纠错输出模块,用于利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction output module is used to replace the error text with the correct text to obtain the standard output text.
本申请实施例将原始文本图像进行预处理操作,去除了所述原始图像中的烦扰因素,为后续对图像中文本的纠错提供了前置基础。进一步地,相较于现有技术中只根据图像信息中字符的本身进行纠错,本申请实施例计算字符的键值及键值对应的结果值,并利用所述键值和所述结果值与一个预设的标准纠错表进行对比,从而对通过图像识别技术得到输出文本进行纠错,使得对错误的纠正更加精准。因此本申请提出的文本纠错方法、装置及计算机可读存储介质,可以实现对低成本,高精度的文本纠错方案。The embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared to the prior art only performing error correction based on the character itself in the image information, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value Compared with a preset standard error correction table, the output text obtained through image recognition technology is corrected to make the correction of errors more accurate. Therefore, the text error correction method, device, and computer-readable storage medium proposed in this application can realize a low-cost, high-precision text error correction solution.
附图说明Description of the drawings
图1为本申请一实施例提供的文本纠错方法的流程示意图;FIG. 1 is a schematic flowchart of a text error correction method provided by an embodiment of this application;
图2为本申请一实施例提供的文本纠错方法的模块示意图;2 is a schematic diagram of modules of a text error correction method provided by an embodiment of this application;
图3为本申请一实施例提供的文本纠错方法的电子设备的内部结构示意图;3 is a schematic diagram of the internal structure of an electronic device of a text error correction method provided by an embodiment of the application;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请提供一种文本纠错方法。参照图1所示,为本申请一实施例提供的文本纠错方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。This application provides a method for text error correction. Referring to FIG. 1, it is a schematic flowchart of a text error correction method provided by an embodiment of this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,文本纠错方法包括:In this embodiment, the text error correction method includes:
S1、获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像。S1. Obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image.
本申请实施例中,所述原始文本图像是通过对纸质文档,如医疗发票纸质文档、书籍等进行二维扫描得到的。In the embodiment of the present application, the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
为了去除通过二维扫描得到的所述原始文本图像中的噪声等干扰因素,本申请实施例首先对所述原始文本图像进行如下预处理:In order to remove interference factors such as noise in the original text image obtained by two-dimensional scanning, the embodiment of the present application first performs the following preprocessing on the original text image:
将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
详细地,本申请实施例利用现有的放大电路将所述原始文本图像的图像信号进行放大处理。所述放大电路是以晶体管为控制元件组成的具有放大电信号功能的电路;根据不同的放大需求可以选择其他合适的放大电路,利用所选择的放大电路将所述原始文本图像不失真地进行放大,得到放大图像信号。In detail, the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image. The amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
进一步地,本申请实施例利用现有的采样电路对所述放大图像信号进行采样。所述采样电路是一种可根据预设采样频率定时对所述放大图像信号进行采样的电路。Further, the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal. The sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
本申请实施例对所述原始文本图像采用上述放大、采样、滤波的处理,去除了所述原 始文本图像中的噪声等干扰因素,得到所述标准图像,保证了后续文本纠错的精确性。The embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image, removes interference factors such as noise in the original text image, obtains the standard image, and ensures the accuracy of subsequent text error correction.
S2、利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本。S2. Use the pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate a key value and a corresponding result value, according to the key value and Corresponding to the result value, the standard image is converted into output text.
优选地,本申请实施例中所述文本识别模型可以是一个预先训练完成的NER(Named Entity Recognition,命名实体识别)模型。Preferably, the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
较佳地,所述NER模型采用Bi-LSTM-CRF结构,包括:Preferably, the NER model adopts the Bi-LSTM-CRF structure, including:
字/词向量层:用于将所述标准图像包含的文本中的单词和字符转化为字/词向量;Bi-LSTM层:将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对分割后的所述字/词向量进行标注,得到键值和结果值;Character/word vector layer: used to convert words and characters in the text contained in the standard image into word/word vectors; Bi-LSTM layer: divide the character/word vector, and divide the character /Word vector encoding to obtain the encoding representation of the character/word vector, and using the encoding representation to label the segmented word/word vector to obtain key values and result values;
CRF层:将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆过程进行解码,生成所述输出文本。CRF layer: splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
其中,所述字/词向量层利用已经训练好的词向量作为初始化参数,将所述标准图像包含的文本中的单词和字符转化为字/词向量,所述已经训练好的词向量是由以往对字/词向量进行转化时总结出来的一套标准转化规则。Wherein, the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector, and the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
由于所述标准图像中包含的文本可能较多,文本中的语句可能较长,如果只是进行字符转换,可能会出现文本粘滞的情况,不利于后续的文本纠错,因此本申请实施例利用所述Bi-LSTM层可将所述字/词向量进行分割。Since the standard image may contain more text and the sentences in the text may be longer, if only character conversion is performed, the text may be sticky, which is not conducive to subsequent text error correction. Therefore, the embodiment of the application uses The Bi-LSTM layer can segment the character/word vector.
优选地,所述Bi-LSTM层可采用java语言将所述字/词向量进行分割,并对分割后的所述字/词向量进行编码,所述编码表征包含Key-B,Value-B,Key-I,Value-I,Other-B,Other-I六类标注类型。其中,Key为键值,Value为结果值,Other为其他值。Preferably, the Bi-LSTM layer can use java language to segment the character/word vector, and encode the segmented character/word vector, and the encoding representation includes Key-B, Value-B, Key-I, Value-I, Other-B, Other-I six types of labeling. Among them, Key is the key value, Value is the result value, and Other is the other value.
所述CRF层将相同类型的键值和结果值进行拼接,如Key-B,Key-I或Value-B,Value-I。The CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
进一步地,本申请实施例根据所述键值和对应的结果值将所述标准图像转换为输出文本,例如,本申请其中一个示例中,所述标准图像中有文本“支付2.00元(现金支付)分类自负0.00元”,经上述NER模型处理后,生成如下的输出文本:Further, the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value. For example, in one of the examples of the present application, the standard image contains the text "Pay 2.00 yuan (cash payment) ) The classification is conceited at 0.00 yuan. After processing by the above NER model, the following output text is generated:
Key:{支付,分类自负}Key: {Payment, categorized at your own risk}
Value:{2.00元,0.00元}Value: {2.00 yuan, 0.00 yuan}
Other(现金支付)Other (Cash Payment)
S3、利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本。S3. Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance.
由于所述标准图像中可能存在着一些内容上的错误,如错别字等,因此,本申请实施例将利用一个标准纠错表对上述的输出文本进行纠错。Since there may be some content errors in the standard image, such as typos, etc., this embodiment of the present application will use a standard error correction table to correct the above output text.
本申请实施例中,所述标准纠错表是由没有任何错误的字符串及所述字符串对应的键值和结果值组成。In the embodiment of the present application, the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string.
所述编辑距离是指两个字符串之间,由一个字符串转换成另一个字符串所需的最少编辑操作次数。The edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
例如,计算输出文本ABBD与标准纠错表中字符串ABCD的编辑距离,由于所述输出文本ABBD与字符串ABCD中只有第三位的字符不相同,可利用上述方法计算出最少编辑操作次数为1,即是将‘B’字符替换成‘C’字符。For example, calculate the edit distance between the output text ABBD and the character string ABCD in the standard error correction table. Since only the third character in the output text ABBD and the character string ABCD are different, the above method can be used to calculate the minimum number of editing operations as 1, that is, replace the'B' character with the'C' character.
详细地,本申请实施例利用如下编辑距离算法计算所述编辑距离Sim topicIn detail, the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
其中,R为所述输出文本键值,S为标准纠错表的键值,Pearson为编辑距离运算。Wherein, R is the key value of the output text, S is the key value of the standard error correction table, and Pearson is the edit distance calculation.
进一步地,为了筛选出所述标准纠错表中的哪些字符串可以用于所述输出文本的纠错, 本申请实施例根据所述根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本,包括:Further, in order to filter out which character strings in the standard error correction table can be used for error correction of the output text, the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance. The correct text corresponding to the wrong text includes:
将输出文本的键值与标准纠错表的键值之间的编辑距离与预设的距离阈值进行对比;Compare the edit distance between the key value of the output text and the key value of the standard error correction table with the preset distance threshold;
在所述编辑距离小于所述距离阈值时,对应的输出文本的键值判定为错误字符以及对应的标准纠错表的键值判定为对应的正确字符;When the edit distance is less than the distance threshold, the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
汇集所有错误字符得到所述输出文本中的错误文本,以及汇集所述正确字符得到所述错误文本对应的正确文本。Collecting all the wrong characters to obtain the wrong text in the output text, and collecting the correct characters to obtain the correct text corresponding to the wrong text.
进一步地,所述编辑距离大于或等于所述距离阈值,则说明所述输出文本与所述标准纠错表不匹配,无法用所述标准纠错表对所述输出文本进行纠错。Further, if the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
S4、利用所述正确文本替换所述错误文本,得到标准输出文本。S4. Replace the erroneous text with the correct text to obtain standard output text.
本申请实施例可直接利用所述正确文本替换所述错误文本,即可实现对错误文本中的错误内容进行纠错,得到标准输出文本。In the embodiment of the present application, the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
需要强调的是,为进一步保证文本图像的私密和安全性,原始文本图像还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the text image, the original text image can also be stored in a node of a blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
同时,本方案可试用于智慧城市中的智慧医疗等领域中,从而推动智慧城市的建设。At the same time, this solution can be tested in the fields of smart medical care in smart cities, so as to promote the construction of smart cities.
本申请实施例将原始文本图像进行预处理操作,去除了所述原始图像中的烦扰因素,为后续对图像中文本的纠错提供了前置基础。进一步地,相较于现有技术中只根据图像信息本身进行字符的纠错,本申请实施例计算字符的键值及键值对应的结果值,并利用所述键值和所述结果值与一个预设的标准纠错表进行对比,从而利用一个预设的标准纠错表对通过图像识别技术得到输出文本进行纠错,使得对错误的纠正更加精准。因此本申请提出的文本纠错方法、装置及计算机可读存储介质,可以实现对低成本,高精度的文本纠错。The embodiment of the present application performs a preprocessing operation on the original text image, which removes the disturbing factors in the original image, and provides a pre-foundation for subsequent error correction of the text in the image. Further, compared with the prior art that only performs character error correction based on the image information itself, the embodiment of the present application calculates the key value of the character and the result value corresponding to the key value, and uses the key value and the result value and A preset standard error correction table is compared, so that a preset standard error correction table is used to correct the output text obtained through the image recognition technology, so that the error correction is more accurate. Therefore, the text error correction method, device and computer-readable storage medium proposed in this application can realize low-cost, high-precision text error correction.
如图2所示,是本申请文本纠错装置的功能模块图。As shown in Figure 2, it is a functional block diagram of the text error correction device of the present application.
本申请所述文本纠错装置100可以安装于电子设备中。根据实现的功能,所述文本纠错装置可以包括图像获取模块101、图像分割模块102、匹配模块103和纠错模块104。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The text error correction device 100 described in this application can be installed in an electronic device. According to the implemented functions, the text error correction device may include an image acquisition module 101, an image segmentation module 102, a matching module 103, and an error correction module 104. The module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:
所述调制转化模块101用于获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;The modulation conversion module 101 is configured to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
所述文本分割模块102,用于利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;The text segmentation module 102 is configured to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values , Converting the standard image into output text according to the key value and the corresponding result value;
所述距离计算模块103,用于利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;The distance calculation module 103 is configured to calculate the edit distance between the output text and a preset standard error correction table by using the key value, and obtain the error text and the error text in the output text according to the edit distance Corresponding correct text;
所述纠错输出模块104,用于利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction output module 104 is configured to replace the error text with the correct text to obtain standard output text.
详细地,所述所述文本纠错装置100各模块的具体实施步骤如下:In detail, the specific implementation steps of each module of the text error correction device 100 are as follows:
所述图像获取模块101获取原始文本图像,将所述原始文本图像进行预处理操作,得 到标准图像。The image acquisition module 101 acquires an original text image, and performs a preprocessing operation on the original text image to obtain a standard image.
本申请实施例中,所述原始文本图像是通过对纸质文档,如医疗发票纸质文档、书籍等进行二维扫描得到的。In the embodiment of the present application, the original text image is obtained by two-dimensional scanning of paper documents, such as medical invoice paper documents, books, etc.
为了去除通过二维扫描得到的所述原始文本图像中的噪声等干扰因素,本申请实施例首先对所述原始文本图像进行如下预处理:In order to remove interference factors such as noise in the original text image obtained by two-dimensional scanning, the embodiment of the present application first performs the following preprocessing on the original text image:
将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
详细地,本申请实施例利用现有的放大电路将所述原始文本图像的图像信号进行放大处理。所述放大电路是以晶体管为控制元件组成的具有放大电信号功能的电路;根据不同的放大需求可以选择其他合适的放大电路,利用所选择的放大电路将所述原始文本图像不失真地进行放大,得到放大图像信号。In detail, the embodiment of the present application utilizes an existing amplifying circuit to amplify the image signal of the original text image. The amplifying circuit is a circuit with a function of amplifying electrical signals composed of a transistor as a control element; other suitable amplifying circuits can be selected according to different amplifying requirements, and the original text image can be amplified without distortion by using the selected amplifying circuit , Get the enlarged image signal.
进一步地,本申请实施例利用现有的采样电路对所述放大图像信号进行采样。所述采样电路是一种可根据预设采样频率定时对所述放大图像信号进行采样的电路。Further, the embodiment of the present application utilizes an existing sampling circuit to sample the amplified image signal. The sampling circuit is a circuit that can periodically sample the amplified image signal according to a preset sampling frequency.
本申请实施例对所述原始文本图像采用上述放大、采样、滤波的处理,去除了所述原始文本图像中的噪声等干扰因素,得到所述标准图像,保证了后续文本纠错的精确性。The embodiment of the present application adopts the above-mentioned enlargement, sampling, and filtering processing on the original text image to remove interference factors such as noise in the original text image, obtain the standard image, and ensure the accuracy of subsequent text error correction.
所述图像分割模块102利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本。The image segmentation module 102 uses a pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value are converted, and the standard image is converted into output text.
由于所述标准图像中包含的文本可能较多,文本中的语句可能较长,如果直接进行字符转换,可能会出现文本粘滞的情况,不利于后续的文本纠错,因此本申请实施例利用文本识别模型对所述标准图像进行文本识别及分割处理。Since the standard image may contain a lot of text and the sentences in the text may be longer, if the character conversion is performed directly, the text may be sticky, which is not conducive to subsequent text error correction. Therefore, the embodiment of the application uses The text recognition model performs text recognition and segmentation processing on the standard image.
优选地,本申请实施例中所述文本识别模型可以是一个预先训练完成的NER(Named Entity Recognition,命名实体识别)模型。Preferably, the text recognition model in the embodiment of the present application may be a pre-trained NER (Named Entity Recognition) model.
较佳地,所述NER模型采用Bi-LSTM-CRF结构,包括:Preferably, the NER model adopts the Bi-LSTM-CRF structure, including:
字/词向量层:用于将所述标准图像包含的文本中的单词和字符进行转化,获得字/词向量;Character/word vector layer: used to convert words and characters in the text contained in the standard image to obtain a word/word vector;
Bi-LSTM层:用于将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对所述字/词向量进行标注,得到键值和结果值;Bi-LSTM layer: used to segment the character/word vector, encode the segmented character/word vector to obtain the encoding representation of the character/word vector, and use the encoding representation to /Word vector for labeling, get key value and result value;
CRF层:将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆过程进行解码,生成所述输出文本。CRF layer: splicing key values and result values of the same type, and decoding the spliced text according to the reverse process of encoding to generate the output text.
其中,所述字/词向量层利用已经训练好的词向量作为初始化参数,将所述标准图像包含的文本中的单词和字符转化为字/词向量,所述已经训练好的词向量是由以往对字/词向量进行转化时总结出来的一套标准转化规则。Wherein, the word/word vector layer uses the trained word vector as an initialization parameter to convert the words and characters in the text contained in the standard image into a word/word vector, and the trained word vector is A set of standard conversion rules summarized in the past when converting word/word vectors.
所述Bi-LSTM层可采用java语言对所述字/词向量进行编码,所述编码表征包含Key-B,Value-B,Key-I,Value-I,Other-B,Other-I六类标注类型。其中,Key为键值,Value为结果值,Other为其他值。The Bi-LSTM layer may use java language to encode the word/word vector, and the encoding representation includes six types: Key-B, Value-B, Key-I, Value-I, Other-B, Other-I Label type. Among them, Key is the key value, Value is the result value, and Other is the other value.
所述CRF层将相同类型的键值和结果值进行拼接,如Key-B,Key-I或Value-B,Value-I。The CRF layer splices the same type of key value and result value, such as Key-B, Key-I or Value-B, Value-I.
进一步地,本申请实施例根据所述键值和对应的结果值将所述标准图像转换为输出文本,例如,本申请其中一个示例中,所述标准图像中有文本“支付2.00元(现金支付)分类自负0.00元”,经上述NER模型处理后,生成如下的输出文本:Further, the embodiment of the present application converts the standard image into output text according to the key value and the corresponding result value. For example, in one of the examples of the present application, the standard image has the text "Pay 2.00 yuan (cash payment) ) The classification is conceited at 0.00 yuan. After processing by the above NER model, the following output text is generated:
Key:{支付,分类自负}Key: {Payment, categorized at your own risk}
Value:{2.00元,0.00元}Value: {2.00 yuan, 0.00 yuan}
Other(现金支付)Other (Cash Payment)
所述匹配模块103利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本。The matching module 103 uses the key value to calculate the edit distance between the output text and the preset standard error correction table, and obtains the error text in the output text and the correct text corresponding to the error text according to the edit distance .
由于所述标准图像中可能存在着一些内容上的错误,如错别字等,因此,本申请实施例将利用一个标准纠错表对上述的输出文本进行纠错。Since there may be some content errors in the standard image, such as typos, etc., this embodiment of the present application will use a standard error correction table to correct the above output text.
本申请实施例中,所述标准纠错表是由没有任何错误的字符串及所述字符串对应的键值和结果值组成。所述编辑距离是指两个字符串之间,由一个字符串转换成另一个字符串所需的最少编辑操作次数。In the embodiment of the present application, the standard error correction table is composed of a character string without any errors and the key value and result value corresponding to the character string. The edit distance refers to the minimum number of editing operations required to convert one character string into another character string between two character strings.
例如,计算输出文本ABBD与标准纠错表中字符串ABCD的编辑距离,由于所述输出文本ABBD与字符串ABCD中只有第三位的字符不相同,可利用上述方法计算出最少编辑操作次数为1,即是将‘B’字符替换成‘C’字符。For example, calculate the edit distance between the output text ABBD and the character string ABCD in the standard error correction table. Since only the third character in the output text ABBD and the character string ABCD are different, the above method can be used to calculate the minimum number of editing operations as 1, that is, replace the'B' character with the'C' character.
详细地,本申请实施例利用如下编辑距离算法计算所述编辑距离Sim topicIn detail, the embodiment of the present application uses the following edit distance algorithm to calculate the edit distance Sim topic :
Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
其中,R为所述输出文本键值,S为标准纠错表的键值,Pearson为编辑距离运算。Wherein, R is the key value of the output text, S is the key value of the standard error correction table, and Pearson is the edit distance calculation.
进一步地,为了筛选出所述标准纠错表中的哪些字符串可以用于所述输出文本的纠错,本申请实施例根据所述根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本,包括:Further, in order to filter out which character strings in the standard error correction table can be used for error correction of the output text, the embodiment of the present application obtains the error text and the error text in the output text according to the edit distance. The correct text corresponding to the wrong text includes:
将输出文本的键值与标准纠错表的键值之间的编辑距离与预设的距离阈值进行对比;Compare the edit distance between the key value of the output text and the key value of the standard error correction table with the preset distance threshold;
在所述编辑距离小于所述距离阈值时,对应的输出文本的键值判定为错误字符以及对应的标准纠错表的键值判定为对应的正确字符;When the edit distance is less than the distance threshold, the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
汇集所有错误字符得到所述输出文本中的错误文本,以及汇集所述正确字符得到所述错误文本对应的正确文本。Collecting all the wrong characters to obtain the wrong text in the output text, and collecting the correct characters to obtain the correct text corresponding to the wrong text.
进一步地,所述编辑距离大于或等于所述距离阈值,则说明所述输出文本与所述标准纠错表不匹配,无法用所述标准纠错表对所述输出文本进行纠错。Further, if the edit distance is greater than or equal to the distance threshold, it means that the output text does not match the standard error correction table, and the standard error correction table cannot be used to correct the output text.
所述纠错模块104利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction module 104 replaces the error text with the correct text to obtain the standard output text.
本申请实施例可直接利用所述正确文本替换所述错误文本,即可实现对错误文本中的错误内容进行纠错,得到标准输出文本。In the embodiment of the present application, the correct text can be directly used to replace the erroneous text, so that the error content in the erroneous text can be corrected, and the standard output text can be obtained.
如图3所示,是本申请实现文本纠错方法的电子设备的结构示意图。As shown in FIG. 3, it is a schematic diagram of the structure of an electronic device implementing the text error correction method of the present application.
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如文本纠错程序12。The electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a text error correction program 12.
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如文本纠错程序12的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。所述计算机可读存储介质可以是非易失性,也可以是易失性。Wherein, the memory 11 includes at least one type of readable storage medium, the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc. The memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the text error correction program 12, etc., but also to temporarily store data that has been output or will be output. The computer-readable storage medium may be non-volatile or volatile.
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及 各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行文本纠错程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc. The processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing Text error correction programs, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
图3仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device 1 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power The device implements functions such as charge management, discharge management, and power consumption management. The power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device 1 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。Further, the electronic device 1 may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a user interface. The user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only for illustrative purposes, and are not limited by this structure in the scope of the patent application.
所述电子设备1中的所述存储器11存储的文本纠错程序12是多个指令的组合,在所述处理器10中运行时,可以实现:The text error correction program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;Acquiring an original text image, and performing a preprocessing operation on the original text image to obtain a standard image;
利用预先训练完成的文本识别模型对所述标准图像进行文本识别及分割处理并对分割后的所述标准图像生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;Use the pre-trained text recognition model to perform text recognition and segmentation processing on the standard image, and generate key values and corresponding result values for the standard image after segmentation. According to the key values and corresponding result values, The standard image is converted into output text;
利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance;
利用所述正确文本替换所述错误文本,得到标准输出文本。Replace the error text with the correct text to obtain the standard output text.
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。Further, if the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed equipment, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络 单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application.
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any associated diagram marks in the claims should not be regarded as limiting the claims involved.
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claims can also be implemented by one unit or device through software or hardware. The second class words are used to indicate names, and do not indicate any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims (20)

  1. 一种文本纠错方法,其中,所述方法包括:A method for text error correction, wherein the method includes:
    获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;Acquiring an original text image, and performing a preprocessing operation on the original text image to obtain a standard image;
    利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;Use the pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to the key values and corresponding Result value, converting the standard image into output text;
    利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance;
    利用所述正确文本替换所述错误文本,得到标准输出文本。Replace the error text with the correct text to obtain the standard output text.
  2. 如权利要求1所述的文本纠错方法,其中,所述将所述原始文本图像进行预处理操作,得到标准图像,包括:5. The text error correction method according to claim 1, wherein the preprocessing operation of the original text image to obtain a standard image comprises:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
    对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
  3. 如权利要求1所述的文本纠错方法,其中,所述文本识别模型包括:The text error correction method according to claim 1, wherein the text recognition model comprises:
    字/词向量层,用于将所述标准图像包含的文本中的单词和字符进行转化,获得字/词向量;The word/word vector layer is used to convert words and characters in the text contained in the standard image to obtain a word/word vector;
    Bi-LSTM层,用于将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对所述字/词向量进行标注,得到键值和结果值;The Bi-LSTM layer is used to divide the character/word vector, encode the character/word vector after the segmentation, to obtain the coding representation of the character/word vector, and use the coding representation for the character /Word vector for labeling, get key value and result value;
    CRF层,用于将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆过程进行解码,生成所述输出文本。The CRF layer is used to splice the key values and result values of the same type, and decode the spliced text according to the reverse process of encoding to generate the output text.
  4. 如权利要求3所述的文本纠错方法,其中,所述计算所述输出文本与预设的标准纠错表的编辑距离,包括:5. The text error correction method according to claim 3, wherein said calculating the edit distance between the output text and a preset standard error correction table comprises:
    利用如下编辑距离算法计算所述编辑距离:The edit distance is calculated using the following edit distance algorithm:
    Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
    其中,R为所述输出文本的键值,S为标准纠错表的键值,Pearson为编辑距离运算,Sim topic为键值之间的编辑距离。 Where R is the key value of the output text, S is the key value of the standard error correction table, Pearson is the edit distance calculation, and Sim topic is the edit distance between the key values.
  5. 如权利要求4所述的文本纠错方法,其中,所述根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本,包括:5. The text error correction method according to claim 4, wherein said obtaining the error text in the output text and the correct text corresponding to the error text according to the edit distance comprises:
    将输出文本的键值与标准纠错表的键值之间的编辑距离与预设的距离阈值进行对比;Compare the edit distance between the key value of the output text and the key value of the standard error correction table with the preset distance threshold;
    在所述编辑距离小于所述距离阈值时,对应的输出文本的键值判定为错误字符以及对应的标准纠错表的键值判定为对应的正确字符;When the edit distance is less than the distance threshold, the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
    汇集所有错误字符得到所述输出文本中的错误文本,以及汇集所述正确字符得到所述错误文本对应的正确文本。Collecting all the wrong characters to obtain the wrong text in the output text, and collecting the correct characters to obtain the correct text corresponding to the wrong text.
  6. 一种文本纠错装置,其中,所述装置包括:A text error correction device, wherein the device includes:
    调制转化模块,用于获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;The modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
    文本分割模块,用于利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;The text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
    距离计算模块,用于利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;The distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
    纠错输出模块,用于利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  7. 如权利要求6所述的文本纠错装置,其中,所述将所述原始文本图像进行预处理操作,得到标准图像,包括:7. The text error correction device according to claim 6, wherein the preprocessing operation of the original text image to obtain a standard image comprises:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
    对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
  8. 如权利要求6所述的文本纠错装置,其中,所述文本识别模型包括:7. The text error correction device of claim 6, wherein the text recognition model comprises:
    字/词向量层,用于将所述标准图像包含的文本中的单词和字符进行转化,获得字/词向量;The word/word vector layer is used to convert words and characters in the text contained in the standard image to obtain a word/word vector;
    Bi-LSTM层,用于将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对所述字/词向量进行标注,得到键值和结果值;The Bi-LSTM layer is used to divide the character/word vector, encode the character/word vector after the segmentation, to obtain the coding representation of the character/word vector, and use the coding representation for the character /Word vector for labeling, get key value and result value;
    CRF层,用于将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆过程进行解码,生成所述输出文本。The CRF layer is used to splice the key values and result values of the same type, and decode the spliced text according to the reverse process of encoding to generate the output text.
  9. 如权利要求8所述的文本纠错装置,其中,所述计算所述输出文本与预设的标准纠错表的编辑距离,包括:8. The text error correction device according to claim 8, wherein said calculating the edit distance between the output text and a preset standard error correction table comprises:
    利用如下编辑距离算法计算所述编辑距离:The edit distance is calculated using the following edit distance algorithm:
    Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
    其中,R为所述输出文本的键值,S为标准纠错表的键值,Pearson为编辑距离运算,Sim topic为键值之间的编辑距离。 Where R is the key value of the output text, S is the key value of the standard error correction table, Pearson is the edit distance calculation, and Sim topic is the edit distance between the key values.
  10. 如权利要求6所述的文本纠错装置,其中,所述调制转化模块将所述原始文本图像进行预处理操作时,执行:7. The text error correction device according to claim 6, wherein when the modulation conversion module performs a preprocessing operation on the original text image, it executes:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
    对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
  11. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有文本纠错程序,所述文本纠错程序可被一个或者多个处理器执行,以实现如下步骤:A computer-readable storage medium, wherein a text error correction program is stored on the computer-readable storage medium, and the text error correction program can be executed by one or more processors to implement the following steps:
    获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;Acquiring an original text image, and performing a preprocessing operation on the original text image to obtain a standard image;
    利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;Use the pre-trained text recognition model to perform text recognition on the standard image to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to the key values and corresponding Result value, converting the standard image into output text;
    利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;Calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correct text corresponding to the error text according to the edit distance;
    利用所述正确文本替换所述错误文本,得到标准输出文本。Replace the error text with the correct text to obtain the standard output text.
  12. 如权利要求11所述的计算机可读存储介质,其中,所述将所述原始文本图像进行预处理操作,得到标准图像,包括:11. The computer-readable storage medium of claim 11, wherein the preprocessing operation of the original text image to obtain a standard image comprises:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
    对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
  13. 如权利要求11所述的计算机可读存储介质,其中,所述文本识别模型包括:11. The computer-readable storage medium of claim 11, wherein the text recognition model comprises:
    字/词向量层,用于将所述标准图像包含的文本中的单词和字符进行转化,获得字/词向量;The word/word vector layer is used to convert words and characters in the text contained in the standard image to obtain a word/word vector;
    Bi-LSTM层,用于将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对所述字/词向量进行标注,得到键值和结果值;The Bi-LSTM layer is used to divide the character/word vector, encode the character/word vector after the segmentation, to obtain the coding representation of the character/word vector, and use the coding representation for the character /Word vector for labeling, get key value and result value;
    CRF层,用于将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆 过程进行解码,生成所述输出文本。The CRF layer is used to splice the key values and result values of the same type, and decode the spliced text according to the reverse process of encoding to generate the output text.
  14. 如权利要求13所述的计算机可读存储介质,其中,所述计算所述输出文本与预设的标准纠错表的编辑距离,包括:15. The computer-readable storage medium of claim 13, wherein the calculating the edit distance between the output text and a preset standard error correction table comprises:
    利用如下编辑距离算法计算所述编辑距离:The edit distance is calculated using the following edit distance algorithm:
    Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
    其中,R为所述输出文本的键值,S为标准纠错表的键值,Pearson为编辑距离运算,Sim topic为键值之间的编辑距离。 Where R is the key value of the output text, S is the key value of the standard error correction table, Pearson is the edit distance calculation, and Sim topic is the edit distance between the key values.
  15. 如权利要求14所述的计算机可读存储介质,其中,所述根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本,包括:14. The computer-readable storage medium of claim 14, wherein the obtaining the error text in the output text and the correct text corresponding to the error text according to the edit distance comprises:
    将输出文本的键值与标准纠错表的键值之间的编辑距离与预设的距离阈值进行对比;Compare the edit distance between the key value of the output text and the key value of the standard error correction table with the preset distance threshold;
    在所述编辑距离小于所述距离阈值时,对应的输出文本的键值判定为错误字符以及对应的标准纠错表的键值判定为对应的正确字符;When the edit distance is less than the distance threshold, the key value of the corresponding output text is determined to be an error character, and the key value of the corresponding standard error correction table is determined to be the corresponding correct character;
    汇集所有错误字符得到所述输出文本中的错误文本,以及汇集所述正确字符得到所述错误文本对应的正确文本。Collecting all the wrong characters to obtain the wrong text in the output text, and collecting the correct characters to obtain the correct text corresponding to the wrong text.
  16. 一种文本纠错系统,其中,所述文本纠错系统包括:A text error correction system, wherein the text error correction system includes:
    调制转化模块,用于获取原始文本图像,将所述原始文本图像进行预处理操作,得到标准图像;The modulation conversion module is used to obtain an original text image, and perform a preprocessing operation on the original text image to obtain a standard image;
    文本分割模块,用于利用预先训练完成的文本识别模型对所述标准图像进行文本识别,得到字/词向量,并对所述字/词向量编码,生成键值和对应的结果值,根据所述键值和对应的结果值,将所述标准图像转换为输出文本;The text segmentation module is used to perform text recognition on the standard image using a pre-trained text recognition model to obtain a character/word vector, and encode the character/word vector to generate key values and corresponding result values, according to all The key value and the corresponding result value, and the standard image is converted into output text;
    距离计算模块,用于利用所述键值计算所述输出文本与预设的标准纠错表的编辑距离,根据所述编辑距离得到所述输出文本中的错误文本及所述错误文本对应的正确文本;The distance calculation module is used to calculate the edit distance between the output text and the preset standard error correction table by using the key value, and obtain the error text in the output text and the correctness corresponding to the error text according to the edit distance text;
    纠错输出模块,用于利用所述正确文本替换所述错误文本,得到标准输出文本。The error correction output module is used to replace the error text with the correct text to obtain the standard output text.
  17. 如权利要求16所述的文本纠错系统,其中,所述将所述原始文本图像进行预处理操作,得到标准图像,包括:The text error correction system according to claim 16, wherein the preprocessing operation of the original text image to obtain a standard image comprises:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号;Performing enlargement processing on the image signal of the original text image to obtain an enlarged image signal;
    对所述放大图像信号进行采样,得到采样信号;Sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
  18. 如权利要求16所述的文本纠错系统,其中,所述文本识别模型包括:The text error correction system according to claim 16, wherein the text recognition model comprises:
    字/词向量层,用于将所述标准图像包含的文本中的单词和字符进行转化,获得字/词向量;The word/word vector layer is used to convert words and characters in the text contained in the standard image to obtain a word/word vector;
    Bi-LSTM层,用于将所述字/词向量进行分割,对分割后的所述字/词向量进行编码,得到所述字/词向量的编码表征,利用所述编码表征对所述字/词向量进行标注,得到键值和结果值;The Bi-LSTM layer is used to divide the character/word vector, encode the character/word vector after the segmentation, to obtain the coding representation of the character/word vector, and use the coding representation for the character /Word vector for labeling, get key value and result value;
    CRF层,用于将相同类型的键值和结果值进行拼接,将拼接完成的文本按照编码的逆过程进行解码,生成所述输出文本。The CRF layer is used to splice the key values and result values of the same type, and decode the spliced text according to the reverse process of encoding to generate the output text.
  19. 如权利要求18所述的文本纠错系统,其中,所述计算所述输出文本与预设的标准纠错表的编辑距离,包括:The text error correction system according to claim 18, wherein said calculating the edit distance between the output text and a preset standard error correction table comprises:
    利用如下编辑距离算法计算所述编辑距离:The edit distance is calculated using the following edit distance algorithm:
    Sim topic=Pearson(R,S) Sim topic =Pearson(R,S)
    其中,R为所述输出文本的键值,S为标准纠错表的键值,Pearson为编辑距离运算,Sim topic为键值之间的编辑距离。 Where R is the key value of the output text, S is the key value of the standard error correction table, Pearson is the edit distance calculation, and Sim topic is the edit distance between the key values.
  20. 如权利要求16所述的文本纠错系统,其中,所述调制转化模块将所述原始文本图像进行预处理操作时,执行:The text error correction system according to claim 16, wherein when the modulation conversion module performs a preprocessing operation on the original text image, it executes:
    将所述原始文本图像的图像信号进行放大处理,得到放大图像信号; 对所述放大图像信号进行采样,得到采样信号;Performing amplification processing on the image signal of the original text image to obtain an amplified image signal; sampling the amplified image signal to obtain a sampling signal;
    将所述采样信号进行滤波处理,得到所述标准图像。Filtering the sampled signal to obtain the standard image.
PCT/CN2020/093561 2020-04-23 2020-05-30 Text error correction method and apparatus, computer-readable storage medium and system WO2021212614A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010326324.X 2020-04-23
CN202010326324.XA CN111626118B (en) 2020-04-23 Text error correction method, apparatus, electronic device and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021212614A1 true WO2021212614A1 (en) 2021-10-28

Family

ID=72258113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093561 WO2021212614A1 (en) 2020-04-23 2020-05-30 Text error correction method and apparatus, computer-readable storage medium and system

Country Status (1)

Country Link
WO (1) WO2021212614A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550185A (en) * 2022-04-19 2022-05-27 腾讯科技(深圳)有限公司 Document generation method, related device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177291A1 (en) * 2011-01-07 2012-07-12 Yuval Gronau Document comparison and analysis
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN107844481A (en) * 2017-11-21 2018-03-27 新疆科大讯飞信息科技有限责任公司 Text recognition error detection method and device
CN110046350A (en) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 Grammatical bloopers recognition methods, device, computer equipment and storage medium
CN110782885A (en) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 Voice text correction method and device, computer equipment and computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120177291A1 (en) * 2011-01-07 2012-07-12 Yuval Gronau Document comparison and analysis
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing
CN107633250A (en) * 2017-09-11 2018-01-26 畅捷通信息技术股份有限公司 A kind of Text region error correction method, error correction system and computer installation
CN107844481A (en) * 2017-11-21 2018-03-27 新疆科大讯飞信息科技有限责任公司 Text recognition error detection method and device
CN110046350A (en) * 2019-04-12 2019-07-23 百度在线网络技术(北京)有限公司 Grammatical bloopers recognition methods, device, computer equipment and storage medium
CN110782885A (en) * 2019-09-29 2020-02-11 深圳和而泰家居在线网络科技有限公司 Voice text correction method and device, computer equipment and computer storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550185A (en) * 2022-04-19 2022-05-27 腾讯科技(深圳)有限公司 Document generation method, related device, equipment and storage medium
CN114550185B (en) * 2022-04-19 2022-07-19 腾讯科技(深圳)有限公司 Document generation method, related device, equipment and storage medium

Also Published As

Publication number Publication date
CN111626118A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
WO2022142593A1 (en) Text classification method and apparatus, electronic device, and readable storage medium
WO2021135910A1 (en) Machine reading comprehension-based information extraction method and related device
WO2021189826A1 (en) Message generation method and apparatus, electronic device, and computer-readable storage medium
CN109522552B (en) Normalization method and device of medical information, medium and electronic equipment
WO2021212612A1 (en) Intelligent text error correction method and apparatus, electronic device and readable storage medium
WO2021208696A1 (en) User intention analysis method, apparatus, electronic device, and computer storage medium
WO2021189829A1 (en) Data query method and apparatus, electronic device, and storage medium
WO2021159762A1 (en) Data relationship extraction method and apparatus, electronic device, and storage medium
CN111144210B (en) Image structuring processing method and device, storage medium and electronic equipment
CN108921552B (en) Evidence verification method and device
CN101295332A (en) DICOM file patient information anonymization processing method
WO2022194062A1 (en) Disease label detection method and apparatus, electronic device, and storage medium
CN109784339A (en) Picture recognition test method, device, computer equipment and storage medium
CN107133323A (en) Data model construction method, the implementation method of government affairs service business and device
WO2021189903A1 (en) Audio-based user state identification method and apparatus, and electronic device and storage medium
CN113205814A (en) Voice data labeling method and device, electronic equipment and storage medium
WO2021212614A1 (en) Text error correction method and apparatus, computer-readable storage medium and system
CN116912847A (en) Medical text recognition method and device, computer equipment and storage medium
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN111858942A (en) Text extraction method and device, storage medium and electronic equipment
CN111985491A (en) Similar information merging method, device, equipment and medium based on deep learning
CN112464927B (en) Information extraction method, device and system
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium
CN112633988A (en) User product recommendation method and device, electronic equipment and readable storage medium
WO2022141867A1 (en) Speech recognition method and apparatus, and electronic device and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931922

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931922

Country of ref document: EP

Kind code of ref document: A1