WO2019101066A1 - 一种基于图像的文本录入方法 - Google Patents

一种基于图像的文本录入方法 Download PDF

Info

Publication number
WO2019101066A1
WO2019101066A1 PCT/CN2018/116414 CN2018116414W WO2019101066A1 WO 2019101066 A1 WO2019101066 A1 WO 2019101066A1 CN 2018116414 W CN2018116414 W CN 2018116414W WO 2019101066 A1 WO2019101066 A1 WO 2019101066A1
Authority
WO
WIPO (PCT)
Prior art keywords
entry
image
text
automatically
text content
Prior art date
Application number
PCT/CN2018/116414
Other languages
English (en)
French (fr)
Inventor
徐海燕
冯博
袁皓
孙谷飞
Original Assignee
众安信息技术服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 众安信息技术服务有限公司 filed Critical 众安信息技术服务有限公司
Priority to US16/288,459 priority Critical patent/US20190197309A1/en
Publication of WO2019101066A1 publication Critical patent/WO2019101066A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to text entry technology, and in particular to an image-based text entry method.
  • OCR recognition technology converts the texts of various bills, newspapers, books, manuscripts and other printed materials into Image information, and then use text recognition technology to convert image information into usable computer input technology, as one of the main ways to convert paper documents into usable computer input technology, which can be applied to bank notes, file files, and a large amount of text.
  • the current processing speed can reach 60-80 tickets per minute
  • the passbook recognition rate has reached more than 85%
  • the deposit slip and receipt identification rate has reached more than 90%
  • the recognition rate of more than 85% can be Reducing the data entry of more than 80% can reduce the workload of operators and reduce duplication of effort.
  • 100% accurate recognition cannot be achieved, it is still necessary for the input personnel to perform manual operation on the part of the content and the manual text, and also needs to perform manual review on the identified part.
  • the present invention proposes an image-based text entry method.
  • An aspect of the present invention provides an image-based text entry method, comprising: acquiring an identification parameter corresponding to at least one region in the image, wherein the identification parameter includes text recognized from the at least one region Content and location information associated with the at least one area; selecting an entry location in the entry page and obtaining location information corresponding to the selected entry location; and based on the location information corresponding to the selected entry location And the identification parameter, determining the text content to be entered.
  • the step of acquiring the identification parameter comprises: the step of acquiring the identification parameter corresponding to the at least one region in the image comprises: automatically dividing the image into regions, and automatically dividing the region The text content in the area is identified.
  • the acquiring location information corresponding to the selected entry location includes: acquiring parameter values shared by the plurality of tab pages; and displaying the page automatically positioning according to the acquired parameter values shared by the plurality of tab pages And an area corresponding to the selected entry position; wherein the parameter values shared by the plurality of tab pages include location information corresponding to the selected entry position.
  • the step of identifying the text content in the automatically divided area comprises: identifying the text content in the automatically divided area by using an OCR method.
  • the step of identifying the text content in the automatically divided region comprises: scoring the recognized text content for identification accuracy identification.
  • the step of automatically displaying the display page according to the acquired parameter values shared by the plurality of tab pages to the area corresponding to the selected entry position comprises: comparing the selected position with the selected one The corresponding area is scaled.
  • an image-based text entry apparatus comprising: an acquisition identification parameter unit configured to acquire an identification parameter corresponding to at least one region in the image, wherein the identification parameter includes a Determining the text content in the at least one area and the location information associated with the at least one area; the entry and display linkage unit is configured to select the entry location in the entry page and obtain the corresponding entry location The location information; and the entry text determining unit are configured to determine the text content to be entered based on the location information corresponding to the selected entry location and the identification parameter.
  • the acquisition identification parameter unit further includes an image division and recognition unit configured to automatically divide the image into regions and to text in the automatically divided region Content is identified.
  • the input and display linkage unit is further configured to: acquire parameter values shared by the plurality of tab pages; and display the page to automatically select and select the parameter values shared by the acquired plurality of tab pages.
  • the entry location corresponds to an area; wherein the parameter values shared by the plurality of tab pages include location information corresponding to the selected entry location.
  • the image segmentation and recognition unit is further configured to identify textual content in the automatically segmented region using an OCR approach.
  • the image segmentation and recognition unit is further configured to score the identified text content for identification accuracy identification.
  • the entry and display linkage unit further includes an image scaling unit configured to scale the region corresponding to the selected entry location.
  • Another aspect of the present invention provides a computer readable storage medium having stored thereon processor-executable instructions, and when the processor executes the executable instructions, performing any of the image-based text entry methods described above The method described.
  • the image-based text entry method provided by the present invention makes it possible to efficiently perform an interactive operation of fast entry of forms, tickets, documents, etc., and since the input person is entered in the selected input box, the uploaded image is automatically switched.
  • the corresponding position and the content of the image are enlarged, so that the input person does not need to rely on the manual manual method to drag the image to realize the input, which greatly saves the time for the comparison image to be recorded, improves the recording efficiency, and simultaneously recognizes the recognition by the OCR method.
  • the text content is identified by the identification accuracy, so that when the user performs the review, the user can directly view the accuracy according to the recognition accuracy, which can effectively shorten the review time and greatly improve the input efficiency.
  • FIG. 1 is a flow chart of an image-based text entry method in accordance with an embodiment of the present invention
  • FIG. 2 is a flow chart of a method for implementing ticket text entry in accordance with an embodiment of the present invention
  • FIG. 3 is an example of a ticket image displayed in a display page in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an entry page in accordance with an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an image-based text entry device in accordance with an embodiment of the present invention.
  • FIG. 1 is a flow chart of an image-based text entry method in accordance with an embodiment of the present invention.
  • the invention provides an image-based text entry method, the method comprising the following steps:
  • Step S101 Acquire an identification parameter corresponding to at least one region in the image, where the identification parameter includes text content recognized from the at least one region and location information associated with the at least one region;
  • Step S102 In response to selecting the entry position in the entry page, performing the following operations: acquiring parameter values shared by the plurality of tab pages, and displaying the page automatically positioning to the selected entry according to the acquired parameter values shared by the plurality of tab pages a region corresponding to the location, wherein the parameter values shared by the plurality of tab pages include location information corresponding to the selected entry location;
  • Step S103 Determine the text content to be entered based on the position information and the identification parameter corresponding to the selected entry position.
  • the image targeted by the method includes a plurality of paper documents such as a ticket, a form, a document, and the like, and is not limited to a specific one of the paper documents.
  • the image-based text entry method provided by the present invention is further elaborated below by taking a ticket as an example.
  • FIG. 2 is a flow chart of a method of implementing ticket text entry in accordance with an embodiment of the present invention.
  • Step S201 Upload the ticket image to the entry system.
  • the user will need to use the required ticket file to upload to the system through any suitable means such as a scanner. If the upload is incorrect, the system will prompt the user to re-upload the image according to the type of error.
  • Step S202 determining whether there is an automatic image segmentation model in the system, if yes, proceeding to step S203, otherwise proceeding to step S204.
  • Step S203 Automatically dividing the ticket image by the image automatic division model to obtain position information of the automatically divided region.
  • the image automatic division model in this embodiment is a model based on a machine learning algorithm, and the image is automatically divided into regions by determining the position of the keyword in the image. It should be understood that the region may also be automatically partitioned based on any suitable model and in any suitable manner.
  • Step S204 Enter the pure manual entry mode.
  • Step S205 Automatically identify the text content in the automatically divided area by the OCR method.
  • textual content in the automatically partitioned area may also be automatically identified using any suitable other means.
  • Step S206 scoring the recognized text content to identify the identification accuracy, wherein the high score is the identification item with high recognition accuracy by the system default, and the low score is the identification item with low recognition accuracy by the system default, for example,
  • the identification item with a score of 85 or more is considered to be an identification item with high recognition accuracy
  • a small rectangular frame is added on the side of the drop-in box option of the input position (in this embodiment, the input box) (eg, Figure 4), otherwise considered to be an identification item with low recognition accuracy
  • a small triangle is added to the drop-down box option at the entry position (in this embodiment, the input box) (as shown in Fig. 4).
  • different colors are used in the corresponding drop-down box options to distinguish the recognition accuracy.
  • the identification accuracy identification is for the convenience of the entry personnel to quickly view, and the identification item with high accuracy can be quickly confirmed to complete the entry, and the attention can be focused on identifying the identification item with low accuracy, and correcting the identification inaccurate in time. The problem, thus shortening the review time.
  • the scoring system is only one of the ways to identify the recognition accuracy, and the setting of the score is not unique. Those skilled in the art can identify the recognition accuracy by other suitable methods.
  • Step S207 When the input person selects the input box for text entry in the entry page, the system automatically locates the area corresponding to the keyword of the selected input box in response to the selected input box. Specifically, as shown in FIG. 4, when the input person places the mouse in the “XX City First People's Hospital” 401 in the entry page, the content of “XX City First People's Hospital” in the area of 301 in FIG. 3 will be centered. Displayed on the display page, and the content can be automatically enlarged to fit the size, if necessary, can also be manually adjusted using the zoom tool; similarly, when the entry person puts the mouse in the entry page to the "total" shown in Figure 4.
  • the content of "total amount” and its corresponding value "1000 ⁇ " in the area 302 of Fig. 3 will be displayed on the display page in the center, and the content can also be automatically enlarged to an appropriate size, if necessary, You can also use the zoom tool to make manual adjustments. Similarly, the same functions described above can be achieved when the mouse is placed in any other input box on the display page.
  • a browser cross-tab communication is adopted.
  • the browser window is used to monitor the local storage function localstorage change, wherein the value in the localstorage can be shared among different tabs, and the linkage between the input page and the display page is implemented according to the storage event characteristic, and the specific implementation manner is as follows:
  • the position information of the area automatically divided from the document image in step S203 is represented by the coordinate point point (x, y, w, h), as shown in FIG. 3, where x represents the automatically divided area in the image.
  • the initialization process is performed, the position information coordinate point point of the automatically divided area is added, and the text content recognized in the step S205 for the automatically divided area is saved in the localstorage;
  • the keyword corresponding to the input box that needs to be input is obtained, and the new position corresponding to the keyword is further used.
  • the information coordinate point Point and the text content corresponding to the coordinate point update the corresponding value in the locolstorage.
  • the localstorage change is monitored at the display page, and the corresponding value in the updated loanrstorage is obtained according to the monitored storage event.
  • the image is translated to the corresponding area in the display page and the area is enlarged:
  • the cross-browser tab communication method can also be implemented by using other schemes such as BroadcastChannel, Cookie, and Websocket.
  • BroadcastChannel has better compatibility and longer life cycle than BroadcastChannel.
  • the pollution cookie will add AJAX request header content, and the storage space is limited to 4K; while WebSocket is suitable for small projects, the backend server needs to maintain the connection, and the subsequent message push behavior, occupy more servers. Resources, therefore, in this embodiment, localstorage is used to implement cross-browser tab communication.
  • Step S208 If there is the recognized text content in the input box placed by the mouse at the entry page as shown in FIG. 4, step S209 is performed; otherwise, step S210 is performed;
  • Step S209 determining whether the content of the recognized text is accurate, if it is accurate, executing step S212; otherwise, performing step S211;
  • Step S210 In the input box, manually input text content according to the content displayed on the display page, and then perform step S212;
  • Step S211 Manually correct the recognized text content in the input box
  • Step S212 Click confirm to complete the entry
  • FIG. 5 shows a schematic diagram of an image-based text entry device according to an embodiment of the present invention.
  • the present invention also provides an image-based text entry device as shown in FIG. 5, which includes an acquisition identification parameter unit 501, an entry and display linkage unit 502, and an entry text determination unit 503.
  • the acquisition identification parameter unit 501 is configured to acquire an identification parameter corresponding to one or more regions in the image, wherein the identification parameter includes text content recognized from one or more regions and one or more regions Associated location information.
  • the entry and display linkage unit 502 is configured to perform the following operations in response to selecting the entry location in the entry page: acquiring parameter values shared by the plurality of tab pages, and displaying the page automatically positioned according to the acquired parameter values shared by the plurality of tab pages To an area corresponding to the selected entry position, wherein the parameter values shared by the plurality of tab pages include position information corresponding to the selected entry position.
  • the entry text determining unit 503 is configured to determine the text content to be entered based on the position information and the identification parameter corresponding to the selected entry position.
  • the acquisition identification parameter unit 501 further includes an image division and recognition unit 501a.
  • the image division and recognition unit 501a is configured to automatically divide the image into regions and identify the text content in the automatically divided region.
  • the image segmentation and recognition unit 501a is further configured to identify textual content in the automatically segmented region in an OCR manner.
  • the image segmentation and recognition unit 501a is further configured to score the identified text content to identify the recognition accuracy.
  • the entry and display linkage unit 502 further includes an image scaling unit 502a configured to scale an area corresponding to the selected entry position.
  • the flow of the text entry method of Figures 1, 2 also represents machine readable instructions comprising a program executed by a processor.
  • the program can be embodied in software stored on a tangible computer readable medium such as a CD-ROM, floppy disk, hard disk, digital versatile disk (DVD), Blu-ray disk or other form of memory.
  • a tangible computer readable medium such as a CD-ROM, floppy disk, hard disk, digital versatile disk (DVD), Blu-ray disk or other form of memory.
  • some or all of the example methods in FIG. 1 may utilize an application specific integrated circuit (ASIC), programmable logic device (PLD), field programmable logic device (EPLD), discrete logic, hardware, firmware, or the like. Any combination is implemented.
  • ASIC application specific integrated circuit
  • PLD programmable logic device
  • EPLD field programmable logic device
  • the example process of FIG. 1 can be implemented using coded instructions, such as computer readable instructions, stored on a tangible computer readable medium, such as a hard disk, a flash memory, a read only memory (ROM), a compact disk (CD). ), a digital versatile disc (DVD), a cache, a random access memory (RAM), and/or any other storage medium on which information can be stored for any time (eg, for a long time, permanently, transiently, Temporary buffering, and/or caching of information).
  • a tangible computer readable medium is expressly defined to include any type of computer readable stored signal. Additionally or alternatively, the example process of FIG.
  • 1 may be implemented with encoded instructions (such as computer readable instructions) stored on a non-transitory computer readable medium such as a hard disk, flash memory, read only memory, optical disk, digital general purpose An optical disc, a cache, a random access memory, and/or any other storage medium in which information can be stored at any time (eg, for a long time, permanently, transiently, temporarily buffered, and/or cached of information).
  • a non-transitory computer readable medium such as a hard disk, flash memory, read only memory, optical disk, digital general purpose An optical disc, a cache, a random access memory, and/or any other storage medium in which information can be stored at any time (eg, for a long time, permanently, transiently, temporarily buffered, and/or cached of information).

Abstract

本发明提供了一种基于图像的文本录入方法。该方法包括:获取对应于图像中的至少一个区域的识别参数,其中,识别参数包括从该至少一个区域中识别出的文本内容和与该至少一个区域相关联的位置信息;响应于在录入页面中选中录入位置而执行以下操作:获取多个标签页面共享的参数值,并且显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域,其中,多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息;以及基于与所选中的录入位置相对应的位置信息和识别参数,确定将被录入的文本内容。

Description

一种基于图像的文本录入方法
本申请要求2017年11月21日提交的申请号为No.201711166037.1的中国申请的优先权,通过引用将其全部内容并入本文。
技术领域
本发明涉及文本录入技术,具体涉及一种基于图像的文本录入方法。
发明背景
对票据、表格、文档等的录入,是当前实现对纸质信息数字化管理的重要环节,OCR识别技术是通过扫描等光学输入方式将各种票据、报刊、书籍、文稿及其它印刷品的文字转化为图像信息,再利用文字识别技术将图像信息转化为可以使用的计算机输入技术,作为针对纸质文件转化为可使用的计算机输入技术的主要方式之一,可应用于银行票据、档案卷宗、大量文字资料的录入和处理等领域,目前处理速度可达到每分钟60~80张票据,存折识别率已经达到了85%以上,存单、凭条识别率达到90%以上,85%以上的识别率就能减少80%以上的数据录入员,可减轻操作员的工作量,减少重复劳动。但由于并不能实现100%的精准识别,所以仍需要录入人员针对部分内容,对照文本进行手动操作录入,并且还需要针对已识别部分进行人工复核。
因此,亟需一种基于图像的文本录入方法,使得录入人员可以实现快速录入。
发明内容
针对上述问题,本发明提出了一种基于图像的文本录入方法。
本发明一方面提供了一种基于图像的文本录入方法,包括:获取对应于所述图像中的至少一个区域的识别参数,其中,所述识别参数包括从所述至少一个区域中识别出的文本内容和与所述至少一个区域相关联的位置信息;在录入页面中选中录入位置并获取与所选中的录入位置相对应的位置信息;以及基于所述与所选中的录入位置相对应的位置信息和所述识别参数,确定将被录入的文本内容。在一种实施方式中,所述获取识别参数的步骤包括:所述获取对应于所述图像中的至少一个区域的识别参数的步骤包括:对所述图像进行区域自动划分,并且对自动划分出的区域中的文本内容进行识别。
在一种实施方式中,所述获取与所选中的录入位置相对应的位置信息包括:获取多个标签页面共享的参数值;以及显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域;其中,所述多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息。
在一种实施方式中,所述对自动划分出的区域中的文本内容进行识别的步骤包括:采用OCR方式对所述自动划分出的区域中的文本内容进行识别。
在一种实施方式中,所述对自动划分出的区域中的文本内容进行识别的步骤包括:对识别出的文本内容进行打分,以进行识别准确度标识。
在一种实施方式中,所述显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域的步骤包括:对所述与所选中的录入位置相对应的区域进行缩放。
本发明另一方面提供了一种基于图像的文本录入装置,包括:获取识别参数单元,被配置为获取对应于所述图像中的至少一个区域的识别参数,其中,所述识别参数包括从所述至少一个区域中识别出的文本内容和与所述至少一个区域相关联的位置信息;录入与显示联动单元,被配置为在录入页面中选中录入位置并获取与所选中的录入位置相对应的位置信息;以及录入文本确定单元,被配置为基于所述与所选中的录入位置相对应的位置信息和所述识别参数,确定将被录入的文本内容。
在一种实施方式中,所述获取识别参数单元还包括图像划分和识别单元,所述图像划分和识别单元被配置为对所述图像进行区域自动划分,并且对自动划分出的区域中的文本内容进行识别。
在一种实施方式中,所述录入与显示联动单元进一步被配置为:获取多个标签页面共享的参数值;以及显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域;其中,所述多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息。
在一种实施方式中,所述图像划分和识别单元还被配置为采用OCR方式对所述自动划分出的区域中的文本内容进行识别。
在一种实施方式中,所述图像划分和识别单元还被配置为对识别出的文本内容进行打分,以进行识别准确度标识。
在一种实施方式中,所述录入与显示联动单元还包括图像缩放单元,所述图像缩放单元被配置为对所述与所选中的录入位置相对应的区域进行缩放。
本发明另一方面提供了一种计算机可读存储介质,其上存储有处理器可执行 指令,所述处理器执行所述可执行指令时,执行上述基于图像的文本录入方法中任一项所述的方法。
本发明的有益技术效果:
本发明提供的基于图像的文本录入方法使得可以高效地进行表格、票据、文档等的快速录入的交互操作,并且由于当录入人员在选中的输入框中进行录入时,上传的图像将被自动切换到相对应的位置并且图像的内容被放大,使得录入人员不需要依靠纯手动方式拖动图像来实现录入,大大节约了对照图像进行录入的时间,提高录入效率;同时针对通过OCR方式识别出的文本内容进行识别准确度标识,使得用户在进行复核时,可以直接根据识别准确度的情况进行快速查看,可有效的缩短复核时间,极大提高了录入效率。
附图简要说明
图1是根据本发明的实施例的基于图像的文本录入方法的流程图;
图2是根据本发明的实施例的实现票据文本录入的方法流程图;
图3是根据本发明的实施例的在显示页面中显示的票据图像的一个示例;
图4是根据本发明的实施例的录入页面的一个示意图;
图5是根据本发明的实施例的基于图像的文本录入装置的示意图。
实施本发明的方式
在以下优选的实施例的具体描述中,将参考构成本发明一部分的所附的附图。所附的附图通过示例的方式示出了能够实现本发明的特定的实施例。示例性实施例并不旨在穷尽根据本发明的所有实施例。可以理解,在不偏离本发明的范围的前提下,可以利用其他实施例,也可以进行结构性或者逻辑性的修改。因此,以下的具体描述并非限制性的,且本发明的范围由所附的权利要求所限定。
以下结合附图对本发明进行详细描述。
图1根据本发明实施例的基于图像的文本录入方法的流程图。
本发明提供了基于图像的文本录入方法,该方法包括如下步骤:
步骤S101:获取对应于图像中的至少一个区域的识别参数,其中,识别参数包括从至少一个区域中识别出的文本内容和与至少一个区域相关联的位置信息;
步骤S102:响应于在录入页面中选中录入位置而执行以下操作:获取多个标签页面共享的参数值,并且显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域,其中,多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息;
步骤S103:基于与所选中的录入位置相对应的位置信息和识别参数,确定将 被录入的文本内容。
应理解的是,本方法所针对的图像包括票据、表格、文档等多种纸质文件,不局限于某一种特定的纸质文件。下面以票据为例,进一步详细阐述本发明所提供基于图像的文本录入方法。
图2是根据本发明的实施例的实现票据文本录入的方法流程图。
下面结合图2、3、4对票据文本录入的实现过程进行详细描述。
步骤S201:将票据图像上传到录入系统。
在该步骤中,用户将需要用到所需的票据文件通过扫描仪等任意适当的方式上传到系统,如果上传有误,则系统将根据出错类型提示用户重新上传图像。
步骤S202:判断系统中是否存在图像自动划分模型,如果存在,则进行步骤S203,否则进行步骤S204。
步骤S203:通过图像自动划分模型对票据图像进行自动划分,获得自动划分出的区域的位置信息。
本实施例中的图像自动划分模型为基于机器学习算法的模型,通过判断图像中的关键字位置来对图像进行区域自动划分。应理解,还可以基于任意适当的模型以及通过任何适当的方式对图像进行区域自动划分。
步骤S204:进入纯手动录入模式。
步骤S205:通过OCR方式对自动划分出的区域中的文本内容进行自动识别。
应当理解,也可以采用任意适当的其它方式对自动划分后的区域中的文本内容进行自动识别。
步骤S206:对识别出的文本内容打分以进行识别准确度标识,其中,分值高的为系统默认识别准确度高的识别项,分数低的为系统默认识别准确度低的识别项,例如,在本实施例中,分值在85分以上的识别项被认为是识别准确度高的识别项,并在录入位置(本实施例中为输入框)下拉框选项边上添加小矩形框(如图4所示),否则被认为是识别准确度低的识别项,并在录入位置(本实施例中为输入框)下拉框选项边上添加小三角形(如图4所示)。在其它实施例中,对于识别出的分值不同的文本内容,在对应的下拉框选项中采用标注不同颜色的方式来区分识别准确度。
同时应理解,进行识别准确度标识是为了便于录入人员快速查看,对于准确度高的识别项可以快速确认完成录入,而可以将注意重点放在识别准确度低的识别项,及时纠正识别不准确的问题,从而缩短复核时间。打分制只是对识别准确度进行标识的其中一种方式,并且分值高低的设定不是唯一的,本领域技术人员可以采用其它适当方式对识别准确度进行标识。
步骤S207:当录入人员在录入页面中选中输入框进行文本录入时,系统响应于所选中的输入框,显示页面自动定位到与所选中的输入框的关键字相对应的区 域。具体地,如图4所示,当录入人员在录入页面中将鼠标放置在“XX市第一人民医院”401处时,图3中301区域的“XX市第一人民医院”的内容将居中显示在显示页面上,并且该内容可以自动放大到适合大小,如有需要,还可以使用缩放工具进行手动调整;同样地,当录入人员在录入页面中将鼠标放置到图4所示的“总金额”402处时,图3中302区域中“总金额”及其对应数值“1000¥”的内容将在居中显示在显示页面上,并且该内容还可以自动放大到合适大小,如有需要,还可以使用缩放工具进行手动调整,同样地,当鼠标放置在显示页面的其它任意的输入框时,可以实现上述同样的功能。
本实施例的实施过程中,采用了浏览器跨标签页通信技术(cross-tab communication)。具体地,采用浏览器window监听本地存储功能localstorage的变化,其中,localstorage中的值可以在不同标签页间共享,并且根据storage事件特性来实现录入页面与显示页面的联动,具体实现方式如下:
首先,以坐标点point(x,y,w,h)表示在步骤S203中从票据图像中自动划分出的区域的位置信息,如图3所示,其中,x表示自动划分出的区域在图像中的横向坐标点,y表示自动划分出的区域在图像中的纵向坐标点,w表示自动划分出的区域在x轴方向上的宽度,h表示自动划分出的区域在y轴方向上的高度。
然后,进行初始化过程,添加自动划分出的区域的位置信息坐标点point和步骤S205中针对自动划分出的区域所识别出的文本内容,保存在localstorage中;
随后,监听鼠标滑动事件,用户将鼠标从当前输入框位置滑动到的需要进行输入的输入框位置时,得到需要进行输入的输入框所对应的关键字,进一步用该关键字对应的新的位置信息坐标点Point和该坐标点对应的文本内容更新locolstorage中对应的值。
然后,在显示页面处监听localstorage的变化,根据监听到的storage事件得到更新后的locolstorage中对应的值在显示页面中将图像平移到相应区域并且放大该区域:
应当理解,跨浏览器标签页通信方式还可以采用BroadcastChannel、Cookie、Websocket等其他方案来实现。但localstorage与BroadcastChannel相比兼容性更好、生命周期更长;与cookie相比,由于cookie的改变没有事件通知,所以只能采取轮询脏检查来实现业务逻辑,只能在同域下使用,并且污染cookie以后还会额外增加AJAX的请求头内容,还有存储空间小的限制在4K;而WebSocket适用于小型项目,需要后端服务器维护连接,以及后续的消息推送行为,占用更多的服务器资源,因此,在本实施例中,采用localstorage来实现跨浏览器标签页通信。
步骤S208:如果在如图4所示的录入页面处鼠标所放置的输入框中有识别出的文本内容,则执行步骤S209;否则执行步骤S210;
步骤S209:判断识别文本内容是否准确,如果准确,则执行步骤S212;否则 执行步骤S211;
步骤S210:在输入框中,根据显示页面显示的内容手动输入文本内容,然后执行步骤S212;
步骤S211:在输入框中手动修正识别出的文本内容;
步骤S212:点击确认,完成录入;
另外,图5示出了根据本发明实施例的基于图像的文本录入装置的示意图。本发明还提供了如图5所示的一种基于图像的文本录入装置,该装置包括获取识别参数单元501、录入与显示联动单元502和录入文本确定单元503。具体地,获取识别参数单元501被配置为获取对应于图像中的一个或多个区域的识别参数,其中,识别参数包括从一个或多个区域中识别出的文本内容和与一个或多个区域相关联的位置信息。录入与显示联动单元502被配置为响应于在录入页面中选中录入位置而执行以下操作:获取多个标签页面共享的参数值,并且显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域,其中,多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息。录入文本确定单元503被配置为基于与所选中的录入位置相对应的位置信息和识别参数,确定将被录入的文本内容。
此外,在一种实施方式中,获取识别参数单元501还包括图像划分和识别单元501a。图像划分和识别单元501a被配置为对图像进行区域自动划分,并且对自动划分出的区域中的文本内容进行识别。在一种实施方式中,图像划分和识别单元501a还被配置为用OCR方式对所述自动划分出的区域中的文本内容进行识别。在另一种实施方式中,图像划分和识别单元501a还被配置为对识别出的文本内容进行打分,以对识别准确度进行标识。
另外,一种实施方式中,录入与显示联动单元502还包括图像缩放单元502a,图像缩放单元502a被配置为对与所选中的录入位置相对应的区域进行缩放。
图1、2中的文本录入方法的流程还代表机器可读指令,该机器可读指令包括由处理器执行的程序。该程序可被实体化在被存储于有形计算机可读介质的软件中,该有形计算机可读介质如CD-ROM、软盘、硬盘、数字通用光盘(DVD)、蓝光光盘或其它形式的存储器。替代的,图1中的示例方法中的一些步骤或所有步骤可利用专用集成电路(ASIC)、可编程逻辑器件(PLD)、现场可编程逻辑器件(EPLD)、离散逻辑、硬件、固件等的任意组合被实现。另外,虽然图1所示的流程图描述了该文本录入方法,但可对该文本录入方法中的步骤进行修改、删除或合并。
如上所述,可利用编码指令(如计算机可读指令)来实现图1的示例过程,该编程指令存储于有形计算机可读介质上,如硬盘、闪存、只读存储器(ROM)、光盘(CD)、数字通用光盘(DVD)、高速缓存器、随机访问存储器(RAM)和/ 或任何其他存储介质,在该存储介质上信息可以存储任意时间(例如,长时间,永久地,短暂的情况,临时缓冲,和/或信息的缓存)。如在此所用的,该术语有形计算机可读介质被明确定义为包括任意类型的计算机可读存储的信号。附加地或替代地,可利用编码指令(如计算机可读指令)实现图1的示例过程,该编码指令存储于非暂时性计算机可读介质,如硬盘,闪存,只读存储器,光盘,数字通用光盘,高速缓存器,随机访问存储器和/或任何其他存储介质,在该存储介质信息可以存储任意时间(例如,长时间,永久地,短暂的情况,临时缓冲,和/或信息的缓存)。
虽然参照特定的示例来描述了本发明,其中这些特定的示例仅仅旨在是示例性的,而不是对本发明进行限制,但对于本领域普通技术人员来说显而易见的是,在不脱离本发明的精神和保护范围的基础上,可以对所公开的实施例进行改变、增加或者删除。

Claims (13)

  1. 一种基于图像的文本录入方法,其特征在于,包括:
    获取对应于所述图像中的至少一个区域的识别参数,其中,所述识别参数包括从所述至少一个区域中识别出的文本内容和与所述至少一个区域相关联的位置信息;
    在录入页面中选中录入位置并获取与所选中的录入位置相对应的位置信息;以及,
    基于所述与所选中的录入位置相对应的位置信息和所述识别参数,确定将被录入的文本内容。
  2. 根据权利要求1所述的基于图像的文本录入方法,其特征在于,所述获取对应于所述图像中的至少一个区域的识别参数的步骤包括:对所述图像进行区域自动划分,并且对自动划分出的区域中的文本内容进行识别。
  3. 根据权利要求2所述的文本录入方法,其特征在于,所述获取与所选中的录入位置相对应的位置信息包括:
    获取多个标签页面共享的参数值;以及
    显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域;
    其中,所述多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息。
  4. 根据权利要求2所述的基于图像的文本录入方法,其特征在于,所述对自动划分出的区域中的文本内容进行识别的步骤包括:采用OCR方式对所述自动划分出的区域中的文本内容进行识别。
  5. 根据权利要求2所述的基于图像的文本录入方法,其特征在于,所述对自动划分出的区域中的文本内容进行识别的步骤包括:对识别出的文本内容进行打分,以进行识别准确度标识。
  6. 根据权利要求3所述的基于图像的文本录入方法,其特征在于,所述显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域的步骤包括:对所述与所选中的录入位置相对应的区域进行缩放。
  7. 一种基于图像的文本录入装置,其特征在于,包括:
    获取识别参数单元,被配置为获取对应于所述图像中的至少一个区域的识别 参数,其中,所述识别参数包括从所述至少一个区域中识别出的文本内容和与所述至少一个区域相关联的位置信息;
    录入与显示联动单元,被配置为在录入页面中选中录入位置并获取与所选中的录入位置相对应的位置信息;以及
    录入文本确定单元,被配置为基于所述与所选中的录入位置相对应的位置信息和所述识别参数,确定将被录入的文本内容。
  8. 根据权利要求7所述的基于图像的文本录入装置,其特征在于,所述录入与显示联动单元进一步被配置为:
    获取多个标签页面共享的参数值;以及
    显示页面根据所获取的多个标签页面共享的参数值自动定位到与所选中的录入位置相对应的区域;
    其中,所述多个标签页面共享的参数值包括与所选中的录入位置相对应的位置信息。
  9. 根据权利要求7所述的基于图像的文本录入装置,其特征在于,所述获取识别参数单元还包括图像划分和识别单元,所述图像划分和识别单元被配置为对所述图像进行区域自动划分,并且对自动划分出的区域中的文本内容进行识别。
  10. 根据权利要求9所述的基于图像的文本录入装置,其特征在于,所述图像划分和识别单元还被配置为采用OCR方式对所述自动划分出的区域中的文本内容进行识别。
  11. 根据权利要求9所述的基于图像的文本录入装置,其特征在于,所述图像划分和识别单元还被配置为对识别出的文本内容进行打分,以对识别准确度进行标识。
  12. 根据权利要求7所述的基于图像的文本录入装置,其特征在于,所述录入与显示联动单元还包括图像缩放单元,所述图像缩放单元被配置为对所述与所选中的录入位置相对应的区域进行缩放。
  13. 一种计算机可读存储介质,其上存储有处理器可执行指令,所述处理器执行所述可执行指令时,执行根据权利要求1-6中任一项所述的方法。
PCT/CN2018/116414 2017-11-21 2018-11-20 一种基于图像的文本录入方法 WO2019101066A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/288,459 US20190197309A1 (en) 2017-11-21 2019-02-28 Method for entering text based on image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711166037.1 2017-11-21
CN201711166037.1A CN107958249B (zh) 2017-11-21 2017-11-21 一种基于图像的文本录入方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/288,459 Continuation US20190197309A1 (en) 2017-11-21 2019-02-28 Method for entering text based on image

Publications (1)

Publication Number Publication Date
WO2019101066A1 true WO2019101066A1 (zh) 2019-05-31

Family

ID=61965170

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116414 WO2019101066A1 (zh) 2017-11-21 2018-11-20 一种基于图像的文本录入方法

Country Status (3)

Country Link
US (1) US20190197309A1 (zh)
CN (1) CN107958249B (zh)
WO (1) WO2019101066A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659607A (zh) * 2019-09-23 2020-01-07 天津车之家数据信息技术有限公司 数据核对方法、装置、系统及计算设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958249B (zh) * 2017-11-21 2020-09-11 众安信息技术服务有限公司 一种基于图像的文本录入方法
CN108334484B (zh) * 2017-12-28 2022-01-11 北京科迅生物技术有限公司 数据录入的方法和装置
CN109190629A (zh) * 2018-08-28 2019-01-11 传化智联股份有限公司 一种电子运单生成方法及装置
CN111291290A (zh) * 2018-12-06 2020-06-16 北京京东尚科信息技术有限公司 一种数据处理方法和装置
CN109918416A (zh) * 2019-02-28 2019-06-21 生活空间(沈阳)数据技术服务有限公司 一种单据录入的方法、装置及设备
CN110333813A (zh) * 2019-05-30 2019-10-15 平安科技(深圳)有限公司 发票图片展示的方法、电子装置及计算机可读存储介质
CN110427853B (zh) * 2019-07-24 2022-11-01 北京一诺前景财税科技有限公司 一种智能票据信息提取处理的方法
CN111079708B (zh) * 2019-12-31 2020-12-29 广州市昊链信息科技股份有限公司 一种信息识别方法、装置、计算机设备和存储介质
CN111444908B (zh) * 2020-03-25 2024-02-02 腾讯科技(深圳)有限公司 图像识别方法、装置、终端和存储介质
CN113130023B (zh) * 2021-04-22 2023-04-07 嘉兴易迪希计算机技术有限公司 Edc系统中图文识别录入方法及系统
CN113569834A (zh) * 2021-08-05 2021-10-29 五八同城信息技术有限公司 营业执照识别方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050100216A1 (en) * 2003-11-11 2005-05-12 Sri International Method and apparatus for capturing paper-based information on a mobile computing device
CN101859225A (zh) * 2010-05-31 2010-10-13 济南恒先科技有限公司 通过数字描红实现文字和表格快速录入的方法
CN105718846A (zh) * 2014-12-03 2016-06-29 航天信息股份有限公司 票据信息的录入方法及装置
CN107958249A (zh) * 2017-11-21 2018-04-24 众安信息技术服务有限公司 一种基于图像的文本录入方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8156427B2 (en) * 2005-08-23 2012-04-10 Ricoh Co. Ltd. User interface for mixed media reality
US9147275B1 (en) * 2012-11-19 2015-09-29 A9.Com, Inc. Approaches to text editing
US9292739B1 (en) * 2013-12-12 2016-03-22 A9.Com, Inc. Automated recognition of text utilizing multiple images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050100216A1 (en) * 2003-11-11 2005-05-12 Sri International Method and apparatus for capturing paper-based information on a mobile computing device
CN101859225A (zh) * 2010-05-31 2010-10-13 济南恒先科技有限公司 通过数字描红实现文字和表格快速录入的方法
CN105718846A (zh) * 2014-12-03 2016-06-29 航天信息股份有限公司 票据信息的录入方法及装置
CN107958249A (zh) * 2017-11-21 2018-04-24 众安信息技术服务有限公司 一种基于图像的文本录入方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659607A (zh) * 2019-09-23 2020-01-07 天津车之家数据信息技术有限公司 数据核对方法、装置、系统及计算设备

Also Published As

Publication number Publication date
CN107958249B (zh) 2020-09-11
CN107958249A (zh) 2018-04-24
US20190197309A1 (en) 2019-06-27

Similar Documents

Publication Publication Date Title
WO2019101066A1 (zh) 一种基于图像的文本录入方法
US9158744B2 (en) System and method for automatically extracting multi-format data from documents and converting into XML
US20140067631A1 (en) Systems and Methods for Processing Structured Data from a Document Image
US20210271872A1 (en) Machine Learned Structured Data Extraction From Document Image
US20180121825A1 (en) Providing intelligent file name suggestions
CN105631393A (zh) 信息识别方法及装置
US10339373B1 (en) Optical character recognition utilizing hashed templates
JP2016048444A (ja) 帳票識別プログラム、帳票識別装置、帳票識別システム、および帳票識別方法
JP2014170543A (ja) 処理方法、処理システム及びコンピュータプログラム
WO2014086277A1 (zh) 方便电子化的专业笔记本及其页码自动识别方法
JP7186107B2 (ja) タイトル推定器
JP5412903B2 (ja) 文書画像処理装置、文書画像処理方法および文書画像処理プログラム
JP2019057311A (ja) 帳票情報認識装置および帳票情報認識方法
JP7379987B2 (ja) 情報処理装置及びプログラム
JP2020095374A (ja) 文字認識システム、文字認識装置、プログラム及び文字認識方法
JP4518212B2 (ja) 画像処理装置及びプログラム
WO2021059848A1 (ja) 情報処理装置、情報処理方法及び情報処理プログラム
JP2019057115A (ja) 帳票情報認識装置および帳票情報認識方法
US9170725B2 (en) Information processing apparatus, non-transitory computer readable medium, and information processing method that detect associated documents based on distance between documents
JP4517822B2 (ja) 画像処理装置及びプログラム
JP2020047031A (ja) 文書検索装置、文書検索システム及びプログラム
US11481447B2 (en) Information processing device and non-transitory computer readable medium
JP2002297638A (ja) 文書画像からのタイトル抽出方法
JP5277750B2 (ja) 画像処理プログラム、画像処理装置及び画像処理システム
JP6702198B2 (ja) 情報処理装置及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18880463

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019545978

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18880463

Country of ref document: EP

Kind code of ref document: A1