CN113537200A - Information backfill method, device, equipment and medium based on image recognition - Google Patents

Information backfill method, device, equipment and medium based on image recognition Download PDF

Info

Publication number
CN113537200A
CN113537200A CN202111005895.4A CN202111005895A CN113537200A CN 113537200 A CN113537200 A CN 113537200A CN 202111005895 A CN202111005895 A CN 202111005895A CN 113537200 A CN113537200 A CN 113537200A
Authority
CN
China
Prior art keywords
recognition
text
named entity
information
backfilling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111005895.4A
Other languages
Chinese (zh)
Inventor
满天龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202111005895.4A priority Critical patent/CN113537200A/en
Publication of CN113537200A publication Critical patent/CN113537200A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The application relates to the technical field of medical management and discloses an information backfilling method, device, equipment and medium based on image recognition, wherein the method comprises the following steps: recording the collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; respectively inputting OCR recognition results corresponding to all cases into a BERT model for named entity recognition, and after obtaining a preliminary named entity recognition result, carrying out text classification processing to obtain a final named entity recognition result; and automatically backfilling the final named entity recognition result into different tasks of the corresponding case to finish information backfilling. According to the application, the medical claims data is automatically recognized and then automatically backfilled into the system by adopting the algorithm model auxiliary information backfilling of deep learning, so that the efficiency and accuracy of information backfilling are improved, the labor cost and the time cost of manually inputting task information after manually turning over pictures are reduced, and the improvement of the intelligent degree of information backfilling in the medical claims investigation process is facilitated.

Description

Information backfill method, device, equipment and medium based on image recognition
Technical Field
The present application relates to the field of computer and medical management technologies, and in particular, to an information backfill method, apparatus, device, and medium based on image recognition.
Background
In the insurance industry, a merchant insurance claim check system is generally located at the end of an insurance industry chain, wherein the upstream is mainly connected with each insurance company, and the downstream is mainly connected with a crowdsourcing company which is responsible for claim check. Because the check-out survey is an important link in the check-out process, the last link is required to be closed for the check-out work, and the check-out survey is also an important basis for the insurance company to pay. At present, the traditional check-out survey system needs to crowd-pack each hospital, physical examination institution and public inspection institution and the like manually, after each institution unit takes a picture and uploads relevant data, crowd-pack manual work fills information into a task, and finally returns to the check-out survey system.
However, in the process of research and practice of the prior art, the inventors of the present application found that, in the current medical claims investigation system, the manner of collecting relevant materials by manually interviewing each organization is not only tedious in work flow, but also enormous in the amount of manual work, and requires a large number of staff members to be hired, resulting in increase of labor cost and time cost; moreover, when the volume of the medical data is large, the images are manually browsed and input item by item according to the task list, so that the situation of input errors is easy to occur, a large amount of extra workload is increased, and the expansion of the claim checking work is not facilitated.
Disclosure of Invention
The main purpose of the application is to provide an information backfilling method, device, equipment and medium based on image recognition, and the method, device, equipment and medium are used for solving the problems that in the prior art, the workload of information backfilling of a medical claims check survey system is large, the intelligentization degree is low, errors are prone to occurring when the volume of medical claims data is large, and backfilling efficiency is low.
In order to achieve the above object, a first aspect of an embodiment of the present application provides an information backfilling method based on image recognition, including:
recording the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition;
respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and automatically backfilling the final named entity recognition result to different tasks of the corresponding case to finish information backfilling.
In a possible implementation manner of the first aspect, the text position detection specifically includes acquiring a text position by using a dbnet algorithm in a text detection stage; the text recognition specifically adopts a crnn network to perform text recognition in the text recognition stage.
In a possible implementation manner of the first aspect, before automatically backfilling the final named entity recognition result into a different task of a corresponding case, the method further includes:
and automatically identifying the current backfill target task by adopting an OCR (optical character recognition) model to obtain the backfill field type and the corresponding position information required by the current backfill target task.
In a possible implementation manner of the first aspect, after completing the backfilling of the information, the method further includes:
calculating the accuracy of the intelligent information after backfilling and corresponding positive and negative feedback coefficients, and correcting the BERT model in real time according to the positive and negative feedback coefficients; and calculating corresponding positive and negative feedback coefficients according to the accuracy, the backfill rate and the user satisfaction after the intelligent information is backfilled.
In a possible implementation manner of the first aspect, performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result includes:
performing rule matching on texts classified by the same text attribute in the initial named entity recognition result of the same case; the rules include location rules, keyword matching rules, and format rules.
In a possible implementation manner of the first aspect, before entering the offline acquired case picture into an OCR model for OCR recognition, the method further includes:
carrying out standardization and data structuring processing on the collected case pictures; the standardization is to perform centralized processing on the case picture data through the past average value, and the data structure is to perform data filtering and cleaning on the case picture data and convert the case picture data into a preset data format.
In a possible implementation manner of the first aspect, before the OCR recognition results corresponding to the cases are respectively input into a BERT model for named entity recognition, the method for model training by the BERT model further includes:
inputting the dictionary into a BERT model as a training data set;
defining a training prediction method of a BERT model;
loading a training data set to perform online iterative optimization training;
loading a verification data set to complete the verification test of the BERT model;
and correcting the BERT model according to the verification test result.
A second aspect of the embodiments of the present application further provides an information backfilling device based on image recognition, including:
the OCR recognition module is used for inputting the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition;
the BERT recognition module is used for respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
the text classification module is used for performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and the information backfilling module is used for automatically backfilling the final named entity recognition result into different tasks of the corresponding case to finish information backfilling.
The third aspect of the embodiments of the present application also provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.
The fourth aspect of the embodiments of the present application also proposes a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of any one of the above.
According to the information backfilling method, device, equipment and medium based on image recognition, the case pictures collected under the processing line are input into an OCR model for OCR recognition, the OCR recognition results corresponding to all cases are quickly and accurately obtained, and the workload of manual recognition on the text information in the pictures in the follow-up process is reduced; the named entity recognition processing is carried out on the OCR recognition result corresponding to each case through the BERT model, and the attribute corresponding to each text message can be accurately obtained, so that a large amount of text messages are preliminarily recognized and classified before the information backfilling, the data processing amount of the subsequent information backfilling is reduced, and the information backfilling efficiency is improved; by further and more accurately classifying the text of the initial named entity recognition result, the problem of classification error or recognition error is avoided, and the accuracy of named entity recognition is further improved; the final named entity recognition result is automatically backfilled into each task of the corresponding case, so that intelligent information backfilling of medical case data is completed, the labor cost and the time cost of manually inputting task information after the picture is manually turned over are greatly reduced, the backfilling efficiency of medical claim checking information is improved, and the improvement of the intelligence degree of information backfilling in the process of medical service claim checking investigation is facilitated.
Drawings
Fig. 1 is a schematic flowchart of an information backfilling method based on image recognition according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a structure of an information backfilling device based on image recognition according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
With reference to fig. 1, in order to achieve the above object, an embodiment of the present application provides an information backfill method based on image recognition, where the method includes:
s1, recording the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to each case; the OCR recognition comprises text position detection and text recognition;
s2, inputting the OCR recognition results corresponding to the cases into a BERT model respectively for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
s3, performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and S4, automatically backfilling the final named entity recognition result to different tasks of the corresponding case, and finishing information backfilling.
With the development of image recognition technology, for relevant data of hospitals and physical examination institutions collected in the process of the check-out and check-out survey of the underwriting, the image recognition technology can support the functions of intelligent recognition of the data, information backfilling and the like. According to the method, the medical claims verification case pictures collected under a processing line are input into an OCR (optical Character recognition) model for OCR recognition, the OCR recognition results corresponding to all cases are quickly and accurately obtained, and the workload of manual recognition on the text information in the pictures in the follow-up process is reduced; the named entity recognition processing is carried out on the OCR recognition result corresponding to each case through the BERT model, and the attribute corresponding to each text message can be accurately obtained, so that a large amount of text messages are preliminarily recognized and classified before the information backfilling, the data processing amount of the subsequent information backfilling is reduced, and the information backfilling efficiency is improved; by further and more accurately classifying the text of the initial named entity recognition result, the problem of classification error or recognition error is avoided, and the accuracy of named entity recognition is further improved; the final named entity recognition result is automatically backfilled into each task of the corresponding case, so that intelligent information backfilling of medical case data is completed, the labor cost and the time cost of manually inputting task information after medical case pictures are manually turned over are greatly reduced, the information backfilling efficiency is improved, and the intelligent degree of information backfilling in the process of medical service insurance claim investigation is favorably improved.
The information backfilling of the medical company insurance claim check and claim investigation system is an important link in a claim check process, and the important basis for whether the insurance company pays is that the upstream is connected with each insurance company and the downstream is connected with a crowdsourcing company for claim check and claim investigation. In the claim examination, the offline data of related cases are collected from each associated medical institution unit, including offline medical insurance pictures, medical case home pages, admission knots, discharge knots and other data, wherein the information in the medical insurance pictures includes case numbers, product information, levels, areas of danger, case passes and the like, and the information in the medical case pictures includes past history, main diagnosis, admission time, discharge time and the like.
For step S1, firstly, the offline collected case pictures are recorded into the underwriter research system for OCR recognition, including text position detection in the text detection stage and text recognition in the text recognition stage, after OCR recognition is completed, a preliminary OCR recognition result is obtained, and text attributes included in the medical case pictures are quickly extracted, so that an accurate data basis is provided for subsequent intelligent information backfill, the efficiency of intelligent information backfill is improved, and the workload and time for subsequent manual task information input are greatly reduced.
For step S2, in the NLP field, the BERT (Pre-training of Deep Bidirectional transformations for Language Understanding) model is a Pre-training model, which learns a large amount of prior Language, syntax, and word meaning for downstream tasks through unsupervised training of a large amount of preceding corpora, and is essentially a Deep Bidirectional Pre-training Language Understanding model using Transformers as feature extractors, and is formed by connecting multiple layers of Bidirectional Transformers, and has a 12-layer base version and a 24-layer large version. In the application, after OCR recognition results corresponding to each case are obtained, the OCR recognition results are respectively input into a BERT model, and named entity recognition processing is carried out on text information in the OCR recognition results to obtain attributes of each text. For example, it is recognized that "10/12/2020" is time, "populus" is name, "royal garden road 998 and No. 101 in Jiading district of Shanghai city" is address, "21090219950312" is identification number, "having plantar fasciitis" is main diagnosis, and "having hypertension, anemia" was past history, and so on. According to the method and the device, named entity recognition processing is carried out on the OCR recognition results corresponding to all cases through the BERT model, and attributes corresponding to all text information can be accurately obtained, so that a large amount of medical claim checking text information is preliminarily recognized and classified before information backfilling, the data processing amount of subsequent information backfilling is reduced, and the information backfilling efficiency is improved.
With respect to step S3, after obtaining the preliminary named entity recognition results, further text classification is required because some of the results may not be very accurate, for example, the main diagnosis and the past history describe diseases, and thus the classification is likely to be incorrect. Meanwhile, partial named entity results need to be matched with corresponding rules to complete further text classification, position information of corresponding texts needs to be considered during rule matching, and text positions corresponding to different text attributes are not used, for example, the position of a discharge date is at the back of an admission date. By further text classification processing on the preliminary named entity recognition result, the problems that the same attribute classification is generated but different text positions cause classification errors and the named entity recognition errors are caused by the similar positions are avoided, and the accuracy of the named entity recognition is further improved.
In step S4, in the underwriting claims investigation system, the backfill fields included in the tasks that need to be backfilled with information are different, the backfill target task needs to be automatically identified before backfilling, and after completing the automatic task identification, each text in the final named entity result is backfilled into the task of the corresponding case, for example, the past symptoms are directly input into the past symptom box in the task. This application need not artifical manual entry by oneself after reading the picture, uses manpower sparingly and the time cost, has greatly improved the efficiency of information backfill, promotes the intelligent degree of current merchant's insurance claim investigation system.
In one embodiment, the text position detection specifically includes acquiring a text position by adopting a dbnet algorithm in a text detection stage; the text recognition specifically adopts a crnn network to perform text recognition in the text recognition stage.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text using a Character Recognition method. At present, OCR (optical character recognition) is generally applied to various aspects such as certificate recognition, document retrieval, screenshot recognition and the like. However, the current OCR recognition models are various and generally only can recognize the character information in the picture, but cannot recognize the position of the text and the type of the text attribute.
In a specific embodiment, the specific position of the text in the picture is obtained by constructing an OCR model including text position detection and text attribute recognition and adopting a dbnet algorithm in the text detection stage, so that the problem of where the text exists and how many ranges of the text exist is solved; after the text area is cut according to the specific position, the crnn network is adopted to complete text recognition of the positioned text area in the recognition stage, so that what each character is can be recognized, the text area in the image is converted into character information, and finally, a preliminary OCR recognition result is obtained.
Wherein, the crnn network adopts Tps + resnet18+ bilstm + ctc as a network architecture, and the Tps removes distortion through correcting pictures; the resnet18 ensures the recognition accuracy and the speed; the bilstm semantic information extraction capability is excellent; the ctc is beneficial to predicting long texts, and a crnn network formed by the network architecture can accurately identify text information, so that a reliable and accurate data basis is provided for subsequent named entity identification.
In one embodiment, before automatically backfilling the final named entity recognition result into a different task of a corresponding case, the method further comprises:
and automatically identifying the current backfill target task by adopting an OCR (optical character recognition) model to obtain the backfill field type and the corresponding position information required by the current backfill target task.
In a specific embodiment, before the step S4 of automatically backfilling the final named entity recognition result to a different task corresponding to the case, the method further includes automatically recognizing a backfilling target task of the current window, capturing and recognizing the backfilling target task of the current window by using an OCR model, so as to recognize backfilling field information required by the current backfilling target task, and when the final named entity recognition result is backfilled, each text information in the named entity recognition result can be automatically backfilled to a backfilling field box at a different position of the current window according to a matching rule, for example, an existing symptom in the task is directly filled in the existing symptom, so that the backfilling field required by the task does not need to be manually entered, and the efficiency and the intelligence degree of information backfilling are greatly improved.
In one embodiment, after completing the backfilling of the information, the method further comprises:
calculating the accuracy of the intelligent information after backfilling and corresponding positive and negative feedback coefficients, and correcting the BERT model in real time according to the positive and negative feedback coefficients; and calculating corresponding positive and negative feedback coefficients according to the accuracy, the backfill rate and the user satisfaction after the intelligent information is backfilled.
In a specific embodiment, after the automatic backfilling of the information in the step S4 is completed, the accuracy of the backfilling of the information and the corresponding positive and negative feedback coefficients are calculated by a computer, wherein the calculation process of the accuracy of the backfilling of the information comprises calculating the accuracy after manually reviewing a plurality of information backfilling sheets, counting the accuracy of each information backfilling sheet, and displaying the counting result in a visual chart form. The method comprises the following specific steps of firstly carrying out standardization processing on data corresponding to each index in order to eliminate the influence of index dimension, then calculating information entropy values and information utility values of each index by adopting an entropy value method after the standardization processing is finished, further calculating the weight of each evaluation index, finally calculating a comprehensive score according to the weight of each index and an actual index value, and matching the corresponding feedback coefficient according to a preset scoring interval where the comprehensive score is located. If the feedback coefficient is positive feedback and is larger than a preset value, the accuracy of the model is considered to accord with an expected value, and correction is not needed; if the feedback coefficient is positive feedback and is smaller than a preset value, whether the BERT model has low accuracy in named entity event recognition or whether errors occur in further text classification needs to be judged; if the feedback coefficient is negative feedback, the BERT model needs to be retrained or reconstructed. By collecting the accuracy and the feedback coefficient after the information backfilling, the model is corrected in real time in a targeted manner, the reason for reducing the information backfilling accuracy or lowering the backfilling rate is found in time, the reliability and the practicability of intelligent information backfilling can be improved, the problem that a large amount of backfilling information errors need to be corrected manually in the follow-up process is solved, and the time cost and the labor cost are saved.
In one embodiment, performing text classification on the preliminary named entity recognition result to obtain a final named entity recognition result includes:
performing rule matching on texts classified by the same text attribute in the initial named entity recognition result of the same case; the rules include location rules, keyword matching rules, and format rules.
In a specific embodiment, for step S3, after obtaining the preliminary named entity recognition result of a certain case, there may be a problem that the recognition is not accurate enough, for example, the text describing the disease as both the main diagnosis and the past history is likely to be classified incorrectly, so further text classification is required. Meanwhile, the position information of the text is also considered, two texts classified by the same attribute of the similar position information represent two different information, for example, two times with close positions exist in the text, one is the date of admission and the other is the date of discharge, so that rule matching is needed to complete the identification of the two times, for example, the date of discharge after the date selected in the preset rule is the date of discharge, and the date of discharge is generally on the right side or the lower side of the date of admission; for example, the content between the main diagnosis and the past history in the preset rule is the main diagnosis, after three words of the main diagnosis are found, the following section is the main diagnosis until the next keyword of the past history appears.
In this embodiment, the rule matching includes identification rules of id card numbers, zip codes, dates, departments in which the patient is admitted and departments in which the patient is discharged, and the like, in addition to the identification rules of the date of discharge and the date of admission and the identification rules of the main diagnosis, and thus, the details are not repeated here.
In one embodiment, before entering the offline collected case picture into an OCR model for OCR recognition, the method further comprises:
carrying out standardization and data structuring processing on the collected case pictures; (ii) a The standardization is to perform centralized processing on the case picture data through the past average value, and the data structure is to perform data filtering and cleaning on the case picture data and convert the case picture data into a preset data format.
In a specific embodiment, since the collected case pictures include one or more of insurance pictures, case pictures, institution pictures, and the like, and the kinds of the pictures are different, if the case pictures are not preprocessed according to a standardized format, additional workload is added to subsequent data import and OCR recognition, and recognition efficiency and accuracy are reduced. Therefore, before importing the data of step S1 into OCR model recognition, a preprocessing for standardizing case pictures acquired next to the line is required. Meanwhile, the OCR model in the application carries out information extraction processing on the structured data, and for the identification of the unstructured data, the structured data needs to be firstly converted and then subjected to OCR identification, the unstructured data is converted into the structured data and then subjected to OCR identification, the OCR model can be ensured to be normally identified, the situation that the OCR model cannot be identified due to the unstructured data is avoided, because the case picture is subjected to the same standardized preprocessing and the data structured processing, when the data entry and the OCR identification are carried out, objects processed by the OCR model are all the structured processing of the same standard, the data of different standards are prevented from being converted and identified in the identification process, the identification performance of the OCR model is greatly improved, and the identification efficiency and accuracy of the OCR model are further improved.
In one embodiment, before the OCR recognition results corresponding to the cases are respectively input into a BERT model for named entity recognition, the method for model training by the BERT model further includes:
s21, recording the dictionary into the BERT model as a training data set;
s22, defining a training prediction method of the BERT model;
s23, loading a training data set to perform online iterative optimization training;
s23, loading a verification data set to complete the verification test of the BERT model;
and S24, correcting the BERT model according to the verification test result.
In a specific embodiment, before the OCR recognition results corresponding to each case are input into the BERT model respectively for named entity recognition, model optimization training needs to be performed on the BERT model, and the specific steps include: firstly, inputting a relevant dictionary into a BERT model to serve as a training data set for model training, and after the input is finished, a user can customize a training prediction method of the model, wherein the training prediction method comprises the steps of model definition, a definition data processor and a gradient updating algorithm; after the self-defining of the model is completed, loading a local dictionary model to start online iterative optimization training, for example, aiming at short text classification, the text classification effect after 3 rounds of iteration is obviously improved; after the iterative optimization training is completed, a local verification data set is loaded to carry out verification testing on the trained BERT model, so that the BERT model is finally corrected according to a verification testing result, the prediction performance of the BERT model is improved, the named entity recognition performance of the subsequent BERT model on an OCR recognition result is favorably improved, and the accuracy of named entity recognition is improved. In addition, after each time of information backfilling is finished, the corrected correct information backfilling result is updated to a verification data set to be used as verification test data of the next BERT model, and the training effect of the BERT model can be further improved.
Referring to fig. 2, the present application further provides an information backfilling device based on image recognition, including:
the OCR recognition module 100 is used for inputting the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to the cases; the OCR recognition comprises text position detection and text recognition;
the BERT recognition module 200 is used for inputting the OCR recognition results corresponding to the cases into a BERT model respectively for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
the text classification module 300 is configured to perform text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and the information backfilling module 400 is used for automatically backfilling the final named entity recognition result into different tasks of the corresponding case to complete information backfilling.
According to the method, the case pictures collected under the processing line are input into the OCR model for OCR recognition, the OCR recognition results corresponding to all cases are quickly and accurately obtained, and the workload of manual recognition on the text information in the pictures in the follow-up process is reduced; the named entity recognition processing is carried out on the OCR recognition result corresponding to each case through the BERT model, and the attribute corresponding to each text message can be accurately obtained, so that a large amount of text messages are preliminarily recognized and classified before the information backfilling, the data processing amount of the subsequent information backfilling is reduced, and the information backfilling efficiency is improved; by further and more accurately classifying the text of the initial named entity recognition result, the problem of classification error or recognition error is avoided, and the accuracy of named entity recognition is further improved; the final named entity recognition result is automatically backfilled into each task of the corresponding case, so that intelligent information backfilling of case data is completed, the labor cost and the time cost of manually inputting task information after manually turning over pictures are greatly reduced, the information backfilling efficiency is improved, and the intelligent degree of information backfilling in the process of merchant insurance claim check investigation is favorably improved.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data such as an information backfill method based on image recognition. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an information backfill method based on image recognition. The information backfilling method based on the image recognition comprises the following steps: recording the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition; respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition; performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching; and automatically backfilling the final named entity recognition result to different tasks of the corresponding case to finish information backfilling.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements an information backfilling method based on image recognition, including the steps of: recording the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition; respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition; performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching; and automatically backfilling the final named entity recognition result to different tasks of the corresponding case to finish information backfilling.
According to the information backfilling method based on image recognition, the case pictures collected under the processing line are input into the OCR model for OCR recognition, the OCR recognition results corresponding to each case are quickly and accurately obtained, and the workload of manual recognition on the text information in the pictures in the follow-up process is reduced; the named entity recognition processing is carried out on the OCR recognition result corresponding to each case through the BERT model, and the attribute corresponding to each text message can be accurately obtained, so that a large amount of text messages are preliminarily recognized and classified before the information backfilling, the data processing amount of the subsequent information backfilling is reduced, and the information backfilling efficiency is improved; by further and more accurately classifying the text of the initial named entity recognition result, the problem of classification error or recognition error is avoided, and the accuracy of named entity recognition is further improved; the final named entity recognition result is automatically backfilled into each task of the corresponding case, so that intelligent information backfilling of case data is completed, the labor cost and the time cost of manually inputting task information after manually turning over pictures are greatly reduced, the information backfilling efficiency is improved, and the intelligent degree of information backfilling in the process of merchant insurance claim check investigation is favorably improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. An information backfilling method based on image recognition is characterized by comprising the following steps:
recording the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition;
respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and automatically backfilling the final named entity recognition result to different tasks of the corresponding case to finish information backfilling.
2. The information backfilling method based on image recognition according to claim 1, wherein the text position detection specifically adopts dbnet algorithm to obtain the text position in the text detection stage; the text recognition specifically adopts a crnn network to perform text recognition in the text recognition stage.
3. The image recognition-based information backfilling method according to claim 1, before automatically backfilling the final named entity recognition result into different tasks of the corresponding case, further comprising:
and automatically identifying the current backfill target task by adopting an OCR (optical character recognition) model to obtain the backfill field type and the corresponding position information required by the current backfill target task.
4. The information backfilling method based on image recognition according to claim 1, further comprising, after completing the backfilling of the information:
calculating the accuracy of the intelligent information after backfilling and corresponding positive and negative feedback coefficients, and correcting the BERT model in real time according to the positive and negative feedback coefficients; and calculating corresponding positive and negative feedback coefficients according to the accuracy, the backfill rate and the user satisfaction after the intelligent information is backfilled.
5. The image recognition-based information backfilling method according to claim 1, wherein the step of performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result comprises:
performing rule matching on texts classified by the same text attribute in the initial named entity recognition result of the same case; the rules include location rules, keyword matching rules, and format rules.
6. The information backfilling method based on image recognition according to claim 1, wherein before said entering the case picture collected offline into an OCR model for OCR recognition, further comprising:
carrying out standardization and data structuring processing on the collected case pictures; the standardization is to perform centralized processing on the case picture data through the past average value, and the data structure is to perform data filtering and cleaning on the case picture data and convert the case picture data into a preset data format.
7. The information backfill method based on image recognition according to claim 1, characterized in that before the OCR recognition results corresponding to each case are respectively input into a BERT model for named entity recognition, the BERT model training method further comprises:
inputting the dictionary into a BERT model as a training data set;
defining a training prediction method of a BERT model;
loading a training data set to perform online iterative optimization training;
loading a verification data set to complete the verification test of the BERT model;
and correcting the BERT model according to the verification test result.
8. An information backfilling device based on image recognition is characterized by comprising:
the OCR recognition module is used for inputting the offline collected case pictures into an OCR model for OCR recognition to obtain OCR recognition results corresponding to all cases; the OCR recognition comprises text position detection and text recognition;
the BERT recognition module is used for respectively inputting the OCR recognition results corresponding to the cases into a BERT model for named entity recognition to obtain a preliminary named entity recognition result; the initial named entity recognition result comprises attributes of each text obtained after recognition;
the text classification module is used for performing text classification processing on the preliminary named entity recognition result to obtain a final named entity recognition result; the text classification processing comprises text attribute classification and text position matching;
and the information backfilling module is used for automatically backfilling the final named entity recognition result into different tasks of the corresponding case to finish information backfilling.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202111005895.4A 2021-08-30 2021-08-30 Information backfill method, device, equipment and medium based on image recognition Pending CN113537200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111005895.4A CN113537200A (en) 2021-08-30 2021-08-30 Information backfill method, device, equipment and medium based on image recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111005895.4A CN113537200A (en) 2021-08-30 2021-08-30 Information backfill method, device, equipment and medium based on image recognition

Publications (1)

Publication Number Publication Date
CN113537200A true CN113537200A (en) 2021-10-22

Family

ID=78092270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111005895.4A Pending CN113537200A (en) 2021-08-30 2021-08-30 Information backfill method, device, equipment and medium based on image recognition

Country Status (1)

Country Link
CN (1) CN113537200A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971811A (en) * 2021-11-16 2022-01-25 北京国泰星云科技有限公司 Intelligent container feature identification method based on machine vision and deep learning
CN115049512A (en) * 2022-06-27 2022-09-13 安诚财产保险股份有限公司 Intelligent claim settlement accounting system
CN116341555A (en) * 2023-05-26 2023-06-27 华东交通大学 Named entity recognition method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291195A (en) * 2020-01-21 2020-06-16 腾讯科技(深圳)有限公司 Data processing method, device, terminal and readable storage medium
CN112085012A (en) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 Project name and category identification method and device
CN112560484A (en) * 2020-11-09 2021-03-26 武汉数博科技有限责任公司 Improved BERT training model and named entity recognition method and system
CN112732934A (en) * 2021-01-11 2021-04-30 国网山东省电力公司电力科学研究院 Power grid equipment word segmentation dictionary and fault case library construction method
CN113283244A (en) * 2021-07-20 2021-08-20 湖南达德曼宁信息技术有限公司 Pre-training model-based bidding data named entity identification method
CN113296613A (en) * 2021-03-12 2021-08-24 阿里巴巴新加坡控股有限公司 Customs clearance information processing method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291195A (en) * 2020-01-21 2020-06-16 腾讯科技(深圳)有限公司 Data processing method, device, terminal and readable storage medium
CN112085012A (en) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 Project name and category identification method and device
CN112560484A (en) * 2020-11-09 2021-03-26 武汉数博科技有限责任公司 Improved BERT training model and named entity recognition method and system
CN112732934A (en) * 2021-01-11 2021-04-30 国网山东省电力公司电力科学研究院 Power grid equipment word segmentation dictionary and fault case library construction method
CN113296613A (en) * 2021-03-12 2021-08-24 阿里巴巴新加坡控股有限公司 Customs clearance information processing method and device and electronic equipment
CN113283244A (en) * 2021-07-20 2021-08-20 湖南达德曼宁信息技术有限公司 Pre-training model-based bidding data named entity identification method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113971811A (en) * 2021-11-16 2022-01-25 北京国泰星云科技有限公司 Intelligent container feature identification method based on machine vision and deep learning
CN115049512A (en) * 2022-06-27 2022-09-13 安诚财产保险股份有限公司 Intelligent claim settlement accounting system
CN116341555A (en) * 2023-05-26 2023-06-27 华东交通大学 Named entity recognition method and system
CN116341555B (en) * 2023-05-26 2023-08-04 华东交通大学 Named entity recognition method and system

Similar Documents

Publication Publication Date Title
US10936820B2 (en) Post-filtering of named entities with machine learning
CN108256074B (en) Verification processing method and device, electronic equipment and storage medium
CN113537200A (en) Information backfill method, device, equipment and medium based on image recognition
US11580459B2 (en) Systems and methods for extracting specific data from documents using machine learning
US8762180B2 (en) Claims analytics engine
US11403712B2 (en) Methods and systems for injury segment determination
CN112150298B (en) Data processing method, system, device and readable medium
Zhong et al. E3: Entailment-driven extracting and editing for conversational machine reading
CN113096100A (en) Method for diagnosing plant diseases and plant disease diagnosis system
CN116611449A (en) Abnormality log analysis method, device, equipment and medium
CN115099213A (en) Information processing method and information processing system
WO2021137166A1 (en) Domain based text extraction
US20240202551A1 (en) Visual Question Answering for Discrete Document Field Extraction
US11715310B1 (en) Using neural network models to classify image objects
CN112861757B (en) Intelligent record auditing method based on text semantic understanding and electronic equipment
CN116911852B (en) RPA user dynamic information monitoring method and system
CN117454987B (en) Mine event knowledge graph construction method and device based on event automatic extraction
US20230177472A1 (en) Method for detecting inaccuracies and gaps and for suggesting deterioration mechanisms and actions in inspection reports
CN112785415B (en) Method, device and equipment for constructing scoring card model and computer readable storage medium
CN116703616A (en) Nuclear protection method, device, terminal equipment and storage medium
CN118016229A (en) Diagnostic report standardization method, device, terminal and storage medium
CN116991983A (en) Event extraction method and system for company information text
CN118093527A (en) Report quality inspection method and device and electronic equipment
CN118153713A (en) Machine learning model management method, system, equipment and storage medium
CN116862692A (en) Intelligent reimbursement method and system based on visual question and answer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220531

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Block H, 666 Beijing East Road, Huangpu District, Shanghai 200000

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022

RJ01 Rejection of invention patent application after publication