CN112035616A - BERT model and rule-based medical insurance data code matching method, device and equipment - Google Patents

BERT model and rule-based medical insurance data code matching method, device and equipment Download PDF

Info

Publication number
CN112035616A
CN112035616A CN202010898114.8A CN202010898114A CN112035616A CN 112035616 A CN112035616 A CN 112035616A CN 202010898114 A CN202010898114 A CN 202010898114A CN 112035616 A CN112035616 A CN 112035616A
Authority
CN
China
Prior art keywords
data
code matching
matching
medical insurance
standard data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010898114.8A
Other languages
Chinese (zh)
Other versions
CN112035616B (en
Inventor
满天龙
杨紫崴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202010898114.8A priority Critical patent/CN112035616B/en
Publication of CN112035616A publication Critical patent/CN112035616A/en
Application granted granted Critical
Publication of CN112035616B publication Critical patent/CN112035616B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application relates to the field of digital medical treatment, and discloses a BERT model and rule-based medical insurance data code matching method, device and computer equipment, wherein the method comprises the following steps: acquiring medical insurance external data of a code to be matched; determining the code matching attribute of the medical insurance external data; acquiring a first descriptive statement and a second descriptive statement; acquiring a first vector and a second vector by using a BERT model; calculating the similarity of the first vector and the second vector, and selecting N candidate standard data; selecting M candidate standard data by using a preset rule; obtaining standard code matching data; and finishing data code matching according to a preset code matching rule. According to the medical insurance data code matching method and device based on the BERT model and the rules and the computer equipment, automatic code matching of medical insurance external data is completed by simultaneously utilizing the BERT model and the rules, code matching results are accurate, a large amount of labor cost does not need to be wasted, and the efficiency is high and the cost is low. The application also relates to a block chain technique, wherein the BERT model and standard data are stored in a block chain.

Description

BERT model and rule-based medical insurance data code matching method, device and equipment
Technical Field
The application relates to the field of digital medical treatment, in particular to a method, a device and equipment for matching codes of medical insurance data based on a BERT model and rules.
Background
The intelligent code matching of medical insurance is a common business scene in medical and insurance projects, and is an important link for project development and delivery. Traditional medical treatment is to sign indicating number, and the operation personnel need according to the abundant experience of oneself, and artifical the completion is to sign indicating number, and once is to sign indicating number and generally contains thousands of fields, thousands of tens of thousands of dictionary values, and quantity is huge, consumes the manpower huge. The existing automatic code matching system extracts a large number of rules, does not utilize the most advanced artificial intelligence algorithm, and has high maintenance cost.
Disclosure of Invention
The application mainly aims to provide a BERT model and rule-based medical insurance data code matching method, device and computer equipment, and aims to solve the technical problems of low intelligent degree of cost and high cost of the existing code matching method.
In order to achieve the above object, the present application provides a method for matching codes of medical insurance data based on a BERT model and rules, comprising:
acquiring medical insurance external data of a code to be matched;
determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
acquiring a first descriptive statement corresponding to the upper-level attribute of the code matching attribute, and acquiring a plurality of second descriptive statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each second descriptive statement;
calculating the similarity of the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data, wherein N is a positive integer;
according to a preset matching rule, M standard data matched with the medical insurance external data are selected as candidate standard data, wherein M is a positive integer;
respectively carrying out the same code matching attribute matching on the external data of the medical insurance and each candidate standard data, and selecting the candidate standard data with the same code matching attribute and the maximum code matching attribute as standard code matching data;
and matching the code attribute of the medical insurance external data with the code attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
Further, the step of calculating the similarity between the first vector and each of the second vectors, and selecting N standard data with the highest similarity as candidate standard data includes:
using formulas
Figure BDA0002659024870000021
Calculating the similarity between the first vector and each of the second vectors, wherein AiAnd BiThe ith component of the first vector and each second vector are respectively; and sorting the similarity values from large to small, and selecting the N standard data with the largest similarity value as candidate standard data.
Further, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data includes:
removing punctuation marks of the first descriptive statement, then matching with all second descriptive statements, and directly adding corresponding standard data as candidate standard data if the same second descriptive statements are matched;
if the matching result is not matched, removing nouns in the black list table in the first descriptive statement, then matching with all second descriptive statements, and if the matching result is the same second descriptive statement, directly adding corresponding standard data as candidate standard data;
if the words in the first description sentences are not matched, the words in the first description sentences are changed into words in a near-meaning word list, then the words are matched with all second description sentences, and if the same second description sentences are matched, the corresponding standard data are directly added as candidate standard data;
if not, judging whether the first descriptive statement and the second descriptive statement have an inclusion relationship or whether the editing distance is smaller than K, if so, adding the corresponding standard data as candidate standard data until M candidate standard data are selected, wherein K is a positive integer.
Further, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data comprises:
and acquiring the imported synonym table and the black list table.
Further, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data includes:
and carrying out duplicate removal on the N + M candidate standard data.
Further, the step of performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements further includes:
acquiring short text training data related to the field of medical insurance;
and fine-tuning the existing BERT model by using short text training data to obtain a pre-trained BERT model.
Further, the code matching of the code matching attribute of the medical insurance external data and the code matching attribute in the standard code matching data according to a preset code matching rule further includes the following steps of:
creating an EXCEL format document at a specified position of a storage area;
adding a sheet page named file and a sheet page named dit into the EXCEL document;
and storing the field code matching result in the code matching result in a filmed sheet page, and storing the dictionary value pair code result in the code matching result in a dit sheet page.
The embodiment of the present application further provides a medical insurance data code matching device based on the BERT model and the rules, including:
the data acquisition module is used for acquiring medical insurance external data of the codes to be matched;
the attribute determining module is used for determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
the description statement acquisition module is used for acquiring a first description statement corresponding to the upper-level attribute of the code matching attribute and acquiring a plurality of second description statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
a BERT module, configured to perform natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements;
the calculation module is used for calculating the similarity between the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data;
the first matching module is used for selecting M standard data matched with the medical insurance external data according to a preset matching rule to serve as candidate standard data;
the second matching module is used for respectively carrying out the same code matching attribute matching on the medical insurance external data and each candidate standard data, and selecting the candidate standard data with the same code matching attributes and the maximum code matching attributes as standard code matching data;
and the code matching module is used for matching the code matching attribute of the medical insurance external data and the code matching attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
The present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method of any of the above.
According to the medical insurance data code matching method and device based on the BERT model and the rules and the computer equipment, automatic code matching of medical insurance external data is completed by simultaneously utilizing the BERT model and the rules, code matching results are accurate, a large amount of labor cost does not need to be wasted, and the efficiency is high and the cost is low.
Drawings
Fig. 1 is a schematic flowchart of a code matching method for medical insurance data based on BERT model and rules according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a structure of a medical insurance data code matching device based on a BERT model and rules according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a code matching method for medical insurance data based on a BERT model and rules, including the steps of:
s1, acquiring medical insurance external data of the codes to be matched;
s2, determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
s3, acquiring a first descriptive statement corresponding to the superior attribute of the code matching attribute, and acquiring a plurality of second descriptive statements corresponding to the superior attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
s4, performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each second descriptive statement;
s5, calculating the similarity between the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data, wherein N is a positive integer;
s6, according to a preset matching rule, M standard data matched with the medical insurance external data are selected as candidate standard data, wherein M is a positive integer;
s7, respectively carrying out the same code matching attribute matching on the external data of the medical insurance and each candidate standard data, and selecting the candidate standard data with the most code matching attributes as standard code matching data;
s8, matching the code attribute of the medical insurance external data and the code attribute of the standard code matching data according to a preset code matching rule to obtain a code matching result.
As described above, the data in the database is generally expressed in the form of a table, and the data includes three levels, namely a table, a field and a dictionary value. The data table contains a plurality of fields, and a plurality of dictionary values are arranged under one field. The standard database has a large amount of standard data which are manually coded, and supports manual data import. By the data code matching method, the corresponding relation between the field and the dictionary value in the medical insurance external data and the field and the dictionary value in the standard database can be obtained. In the embodiment, the BERT model and the rules are used simultaneously to match the fields and the dictionary values, and the matching result is more accurate based on the excellent natural language processing capability of the BERT model.
In a specific embodiment, the field-to-code process is as follows:
s01, acquiring external table descriptions in the medical insurance data of the codes to be matched, and acquiring standard table descriptions of all standard tables in a standard database;
s02, natural language processing is carried out on the external table description and the standard table description by utilizing a preset BERT model, and external table description sentence vectors and standard table description sentence vectors are obtained;
s03, calculating the similarity between the external table description sentence vectors and all the standard table description sentence vectors, and selecting N standard tables with the highest similarity as candidate standard tables, wherein N is a positive integer;
s04, selecting M standard tables as candidate standard tables according to a preset table matching rule, wherein M is a positive integer;
s05, respectively carrying out same field matching on the external table and all the candidate standard tables, and taking the table with the most same fields as a code matching standard table;
and S06, performing field matching on the fields of the external table and the fields of the code matching standard table according to a preset first field matching rule to obtain a field code matching result.
As described above, in the field code matching process, a certain number of tables are matched by using the BERT model and the preset rule, where M and N may be equal or unequal, then the fields in the tables are matched, the standard table with the most identical fields is used as the code matching standard table, and then the preset field matching rule is used for field matching, so as to obtain the final field code matching result.
More specifically, the step of performing field matching on the fields of the external table and the fields of the code matching standard table according to a preset first field matching rule to obtain a field code matching result includes:
and sequentially carrying out field code matching according to the following priority order until all the field codes in the medical insurance external data are matched:
(a) equal fields or equal removed punctuation fields;
(b) matching field special keywords;
(c) after the field is replaced by the similar meaning word, editing the distance of 0;
(d) after the field is replaced by the similar meaning word, editing the distance 1;
(e) after the field is replaced by the similar meaning word, editing the distance 2;
(f) after the field is replaced by the similar meaning word, the number of public words is more than 87%;
(g) the fields are mutually contained and have a length within 4;
(h) the fields are mutually contained after being replaced by the similar meaning words;
(i) the length of the field maximum public substring accounts for over 74 percent of the minimum length;
(j) the number of field common words is more than 76%;
(k) after the field is replaced by the similar meaning word, editing the distance 3;
(l) The number of words contained in each field after word segmentation is more than 3.
As described above, after the code matching standard table is determined, matching the fields in the data table of the to-be-matched code with the fields in the code matching standard table, matching according to the rule with the highest priority when matching, and if matching is completed, obtaining the code matching result, and if matching is completed, matching is not performed sequentially according to the rule with the next priority until all the fields are matched. The special keyword matching means that if the field of the code to be matched and the standard field have the same keyword, the special keyword and the similar meaning word can be directly matched, and the special keyword and the similar meaning word can be preset by being introduced in advance. An inclusive relationship means that there is an inclusive or included relationship, e.g., "medical facility" and "fixed point medical facility" belong to the inclusive relationship. The edit distance, also called Levenshtein distance (Levenshtein), is a quantitative measure of the degree of difference between two strings (e.g., english text) by how many times at least processing (insertion, replacement, and deletion) is required to change one string into another. For example, the edit distance between "telephone number" and "mobile number" is 3.
In a specific embodiment, the dictionary value code matching process is as follows:
s001, obtaining external field description in the code data to be paired and all standard field description in the standard database;
s002, natural language processing is carried out on the external field description and all the standard field descriptions by utilizing a preset BERT model, and external field description sentence vectors and standard field description sentence vectors are obtained;
s003, calculating the similarity between the external table description sentence vectors and all the standard table description sentence vectors, and selecting N standard fields with the highest similarity as candidate standard fields;
s004, selecting M standard fields as candidate standard fields according to a preset second field matching rule;
s005, respectively matching the external field with the same dictionary value of all the candidate standard fields, wherein the field with the most same dictionary value is used as a code matching standard field;
and S006, performing dictionary value matching on the dictionary value of the external field and the dictionary value of the code matching standard field according to a preset dictionary value matching rule, and acquiring a dictionary value matching result.
More specifically, the step of performing dictionary value matching on the dictionary value of the external field and the dictionary value of the matching code standard field according to a preset dictionary value matching rule to obtain a dictionary value matching result includes:
performing dictionary value code matching in sequence according to the following priority sequence until all dictionary value code matching in the medical insurance external data are completed:
(a) the dictionary values are equal;
(b) after the punctuation is removed, the dictionary values are equal;
(c) comprises the following components;
(d) the edit distance is 1;
(e) the edit distance is 2;
as described above, dictionary value matching is performed sequentially according to the above priority rules until all external dictionary values are matched, and a dictionary value-to-code result is output.
In an embodiment, the step of calculating the similarity between the first vector and each of the second vectors, and selecting N standard data with the highest similarity as candidate standard data includes:
s51, using formula
Figure BDA0002659024870000081
Calculating the similarity between the first vector and each of the second vectors, wherein AiAnd BiThe ith component of the first vector and each second vector are respectively;
s52, sorting the standard data according to the similarity values from large to small, and selecting the N standard data with the largest similarity values as candidate standard data.
As described above, in the present embodiment, the method of calculating the cosine similarity between the first vector and each of the second vectors is adopted to determine the semantic similarity between the first descriptive statement and each of the second descriptive statements.
In one embodiment, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data includes:
s61, removing punctuation marks of the first descriptive statement, matching with all second descriptive statements, and if the same second descriptive statement is matched, directly adding corresponding standard data as candidate standard data;
s62, if the matching result is not matched, removing nouns in the black list table in the first descriptive statement, then matching the nouns with all second descriptive statements, and if the matching result is the same second descriptive statement, directly adding corresponding standard data as candidate standard data;
s63, if the words in the first descriptive statement are not matched, the words in the first descriptive statement are changed into words in a near word list, then the words are matched with all second descriptive statements, and if the words are matched with the same second descriptive statement, the corresponding standard data are directly added to be candidate standard data;
and S64, if the first descriptive statement and the second descriptive statement are not matched, judging whether the first descriptive statement and the second descriptive statement have an inclusion relationship or whether the editing distance is smaller than K, if so, adding the corresponding standard data as candidate standard data until M candidate standard data are selected, wherein K is a positive integer.
As described above, in this embodiment, a specific rule scheme is provided to select candidate standard data, and the rule is also a selection rule based on priority, which is specifically as follows:
(a) the description is the same after punctuation is removed;
(b) the descriptions of the removed black lists are the same;
(c) alternative synonyms describe the same;
(d) the presence containing relationship or edit distance is described to be less than 3.
It is readily understood that a has the highest priority and d has the lowest priority. And selecting the standard tables in the database according to the priority order of the rules until N candidate standard tables are selected. Here, the concept of blacklisting is introduced. The black list is also manually preset. For example, a blacklist may have 'info', 'basic', etc. terms that are not useful for table descriptions. As another example, a field blacklist is a field that can occur frequently in many tables, such as 'creation time', 'creator', 'update time', 'update', 'timestamp', 'delete or not delete', which are also useless in some cases.
In one embodiment, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data includes:
and acquiring the imported synonym table and the black list table.
As mentioned above, the word list and the blacklist list can be imported manually in advance, so that the code matching efficiency and the code matching accuracy can be improved. In a specific embodiment, the imported table further comprises said keyword table.
In one embodiment, the step of selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data includes:
and carrying out duplicate removal on the N + M candidate standard data.
As described above, in the code matching process, N pieces of standard data are selected by the BERT model and M pieces of standard data are selected by the rule as candidate standard data, but the candidate standard data selected by the BERT model and the rule may be duplicated, so that it is necessary to perform a deduplication operation to improve the efficiency of the code matching process.
In an embodiment, the step of performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements further includes:
s401, short text training data related to the field of medical insurance are obtained;
s402, fine tuning is carried out on the existing BERT model by using short text training data to obtain a pre-trained BERT model.
As described above, through the above operation, the BERT model can be more accurate for word vectors in the medical insurance field, and the final code matching result is more accurate.
In one embodiment, the code matching between the code matching attribute of the medical insurance external data and the code matching attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result further includes:
s81, creating an EXCEL format document at the specified position of the storage area;
s82, adding a sheet page with the name of file and the name of dit in the EXCEL document;
s83, storing the field code matching result in the code matching result in a filtered sheet page, and storing the dictionary value in the code matching result in a dit sheet page.
As described above, after completing the field-to-code and dictionary value-to-code, the results may be saved in a table. In this embodiment, an EXCEL table is created first, where the table name may be "external data name + code matching result", a sheet page named as filtered and a sheet page named as dit are generated in the table, the field code matching result in the code matching result is stored in the filtered sheet page, and the dictionary value code matching result in the code matching result is stored in the dit sheet page, so that a person can conveniently view the code matching result.
In one embodiment, the BERT model and the standard database may be stored in nodes of a blockchain, and a BERT model and rule-based medical insurance data code matching method as described above is implemented in a blockchain network.
As described above, the blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
According to the medical insurance data code matching method based on the BERT model and the rules, automatic code matching of medical insurance external data is completed by simultaneously utilizing the BERT model and the rules, the code matching result is accurate, a large amount of labor cost does not need to be wasted, and the efficiency is high and the cost is low.
Referring to fig. 2, an embodiment of the present application further provides a medical insurance data code matching apparatus based on a BERT model and rules, including:
the data acquisition module 1 is used for acquiring medical insurance external data of a code to be checked;
the attribute determining module 2 is used for determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
a description statement acquisition module 3, configured to acquire a first description statement corresponding to a previous-level attribute of the pair code attribute, and acquire a plurality of second description statements corresponding to previous-level attributes of the pair code attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
a BERT module 4, configured to perform natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements;
a calculating module 5, configured to calculate similarities between the first vector and each of the second vectors, and select N standard data with the highest similarity as candidate standard data;
the first matching module 6 is used for selecting M standard data matched with the medical insurance external data according to a preset matching rule as candidate standard data;
the second matching module 7 is used for performing the same code matching attribute matching on the medical insurance external data and each candidate standard data respectively, and selecting the candidate standard data with the same code matching attribute and the maximum code matching attribute as the standard code matching data;
and the code matching module 8 is used for matching the code matching attribute of the medical insurance external data and the code matching attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
In one embodiment, the calculation module 5 comprises:
a similarity calculation unit for using a formula
Figure BDA0002659024870000121
Calculating the similarity between the first vector and each of the second vectors, wherein AiAnd BiThe ith component of the first vector and each second vector are respectively;
and the sorting selection unit is used for sorting the N standard data with the largest similarity value from large to small according to the similarity value and selecting the N standard data with the largest similarity value as candidate standard data.
In one embodiment, the first matching module 6 comprises:
the first matching unit is used for removing punctuation marks of the first descriptive statement, then matching the punctuation marks with all the second descriptive statements, and if the punctuation marks are matched with the second descriptive statements, directly adding corresponding standard data into the candidate standard data;
the second matching unit is used for removing nouns in the black list table in the first descriptive statement if the nouns are not matched, then matching the nouns with all second descriptive statements, and directly adding corresponding standard data as candidate standard data if the same second descriptive statements are matched;
the third matching unit is used for converting the words in the first description sentences into words in a near-meaning word list if the words are not matched, then matching the words with all the second description sentences, and directly adding the corresponding standard data into candidate standard data if the same second description sentences are matched;
and the fourth matching unit is used for judging whether the first descriptive statement and the second descriptive statement have an inclusion relationship or whether the editing distance is smaller than K if the first descriptive statement and the second descriptive statement are not matched, and if the first descriptive statement and the second descriptive statement have the inclusion relationship or the editing distance is smaller than K, adding the corresponding standard data into the candidate standard data until M candidate standard data are selected, wherein K is a positive integer.
In one embodiment, the BERT model and rule-based medical insurance data code matching device further comprises:
and the near-sense word list and blacklist list acquisition module is used for acquiring the imported near-sense word list and blacklist list.
In one embodiment, the BERT model and rule-based medical insurance data code matching device further comprises:
and the duplication removing module is used for carrying out duplication removal on the N + M candidate standard data.
In one embodiment, the BERT model and rule-based medical insurance data code matching device further comprises:
the training data acquisition module is used for acquiring short text training data related to the field of medical insurance;
and the model fine-tuning module is used for fine-tuning the existing BERT model by using the short text training data to obtain a pre-trained BERT model.
In one embodiment, the BERT model and rule-based medical insurance data code matching device further comprises:
the EXCEL creating module is used for creating an EXCEL format document at a specified position of the storage area;
the sheet page adding module is used for adding sheet pages named as file and dit into the EXCEL document;
and the code matching result storage module is used for storing the field code matching result in the code matching result in a filtered sheet and storing the dictionary value in the code matching result in a dit sheet.
As described above, it can be understood that each component of the medical insurance data code matching device based on the BERT model and the rule provided in the present application may implement the function of any one of the medical insurance data code matching methods based on the BERT model and the rule, and the specific structure is not described again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a standard database. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a BERT model and rule-based medical insurance data code matching method.
The processor executes the medical insurance data code matching method based on the BERT model and the rules, and the method comprises the following steps:
acquiring medical insurance external data of a code to be matched;
determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
acquiring a first descriptive statement corresponding to the upper-level attribute of the code matching attribute, and acquiring a plurality of second descriptive statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each second descriptive statement;
calculating the similarity of the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data, wherein N is a positive integer;
according to a preset matching rule, M standard data matched with the medical insurance external data are selected as candidate standard data, wherein M is a positive integer;
respectively carrying out the same code matching attribute matching on the external data of the medical insurance and each candidate standard data, and selecting the candidate standard data with the same code matching attribute and the maximum code matching attribute as standard code matching data;
and matching the code attribute of the medical insurance external data with the code attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for matching codes of medical insurance data based on a BERT model and rules is implemented, including the steps of:
acquiring medical insurance external data of a code to be matched;
determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
acquiring a first descriptive statement corresponding to the upper-level attribute of the code matching attribute, and acquiring a plurality of second descriptive statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each second descriptive statement;
calculating the similarity of the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data, wherein N is a positive integer;
according to a preset matching rule, M standard data matched with the medical insurance external data are selected as candidate standard data, wherein M is a positive integer;
respectively carrying out the same code matching attribute matching on the external data of the medical insurance and each candidate standard data, and selecting the candidate standard data with the same code matching attribute and the maximum code matching attribute as standard code matching data;
and matching the code attribute of the medical insurance external data with the code attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
According to the medical insurance data code matching method based on the BERT model and the rules, automatic code matching of medical insurance external data is completed by simultaneously utilizing the BERT model and the rules, the code matching result is accurate, a large amount of labor cost does not need to be wasted, the efficiency is high, and the cost is low.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A medical insurance data code matching method based on a BERT model and rules is characterized by comprising the following steps:
acquiring medical insurance external data of a code to be matched;
determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
acquiring a first descriptive statement corresponding to the upper-level attribute of the code matching attribute, and acquiring a plurality of second descriptive statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each second descriptive statement;
calculating the similarity of the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data, wherein N is a positive integer;
according to a preset matching rule, M standard data matched with the medical insurance external data are selected as candidate standard data, wherein M is a positive integer;
respectively carrying out the same code matching attribute matching on the external data of the medical insurance and each candidate standard data, and selecting the candidate standard data with the same code matching attribute and the maximum code matching attribute as standard code matching data;
and matching the code attribute of the medical insurance external data with the code attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
2. The method for medical insurance data pair coding based on BERT model and rule of claim 1, wherein the step of calculating the similarity between the first vector and each of the second vectors, and selecting N standard data with the highest similarity as candidate standard data comprises:
using formulas
Figure FDA0002659024860000011
Calculating the similarity between the first vector and each of the second vectors, wherein AiAnd BiThe ith component of the first vector and each second vector are respectively;
and sorting the similarity values from large to small, and selecting the N standard data with the largest similarity value as candidate standard data.
3. The code matching method for medical insurance data based on the BERT model and the rules as claimed in claim 1, wherein the step of selecting M standard data matched with the medical insurance external data as candidate standard data according to the preset matching rules comprises:
removing punctuation marks of the first descriptive statement, then matching with all second descriptive statements, and directly adding corresponding standard data as candidate standard data if the same second descriptive statements are matched;
if the matching result is not matched, removing nouns in the black list table in the first descriptive statement, then matching with all second descriptive statements, and if the matching result is the same second descriptive statement, directly adding corresponding standard data as candidate standard data;
if the words in the first description sentences are not matched, the words in the first description sentences are changed into words in a near-meaning word list, then the words are matched with all second description sentences, and if the same second description sentences are matched, the corresponding standard data are directly added as candidate standard data;
if not, judging whether the first descriptive statement and the second descriptive statement have an inclusion relationship or whether the editing distance is smaller than K, if so, adding the corresponding standard data as candidate standard data until M candidate standard data are selected, wherein K is a positive integer.
4. The code matching method for medical insurance data based on the BERT model and the rules as claimed in claim 3, wherein the step of selecting M standard data matched with the medical insurance external data according to the preset matching rules as the candidate standard data comprises:
and acquiring the imported synonym table and the black list table.
5. The code matching method for medical insurance data based on the BERT model and the rules as claimed in claim 1, wherein the step of selecting M standard data matched with the medical insurance external data according to the preset matching rules as the candidate standard data comprises the following steps:
and carrying out duplicate removal on the N + M candidate standard data.
6. The medical insurance data code matching method based on the BERT model and the rules according to claim 1, wherein the step of performing natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements further comprises:
acquiring short text training data related to the field of medical insurance;
and fine-tuning the existing BERT model by using short text training data to obtain a pre-trained BERT model.
7. The medical insurance data code matching method based on the BERT model and the rules according to claim 1, wherein the step of matching the code matching attributes of the medical insurance external data and the code matching attributes of the standard code matching data according to a preset code matching rule to obtain a code matching result further comprises the following steps:
creating an EXCEL format document at a specified position of a storage area;
adding a sheet page named file and a sheet page named dit into the EXCEL document;
and storing the field code matching result in the code matching result in a filmed sheet page, and storing the dictionary value pair code result in the code matching result in a dit sheet page.
8. A medical insurance data code matching device based on a BERT model and rules is characterized by comprising:
the data acquisition module is used for acquiring medical insurance external data of the codes to be matched;
the attribute determining module is used for determining code matching attributes of the medical insurance external data, wherein the code matching attributes comprise fields or dictionary values;
the description statement acquisition module is used for acquiring a first description statement corresponding to the upper-level attribute of the code matching attribute and acquiring a plurality of second description statements corresponding to the upper-level attribute of the code matching attribute corresponding to each standard data in a standard database; wherein, the upper attribute of the field is a table, and the upper attribute of the dictionary value is a field;
a BERT module, configured to perform natural language processing on the first descriptive statement and all the second descriptive statements by using a preset BERT model to obtain a first vector corresponding to the first descriptive statement and a second vector corresponding to each of the second descriptive statements;
the calculation module is used for calculating the similarity between the first vector and each second vector, and selecting N standard data with the highest similarity as candidate standard data;
the first matching module is used for selecting M standard data matched with the medical insurance external data according to a preset matching rule to serve as candidate standard data;
the second matching module is used for respectively carrying out the same code matching attribute matching on the medical insurance external data and each candidate standard data, and selecting the candidate standard data with the same code matching attributes and the maximum code matching attributes as standard code matching data;
and the code matching module is used for matching the code matching attribute of the medical insurance external data and the code matching attribute in the standard code matching data according to a preset code matching rule to obtain a code matching result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010898114.8A 2020-08-31 2020-08-31 BERT model and rule-based medical insurance data code matching method, device and equipment Active CN112035616B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010898114.8A CN112035616B (en) 2020-08-31 2020-08-31 BERT model and rule-based medical insurance data code matching method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010898114.8A CN112035616B (en) 2020-08-31 2020-08-31 BERT model and rule-based medical insurance data code matching method, device and equipment

Publications (2)

Publication Number Publication Date
CN112035616A true CN112035616A (en) 2020-12-04
CN112035616B CN112035616B (en) 2024-07-16

Family

ID=73585974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010898114.8A Active CN112035616B (en) 2020-08-31 2020-08-31 BERT model and rule-based medical insurance data code matching method, device and equipment

Country Status (1)

Country Link
CN (1) CN112035616B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657114A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for generating disease name code matching list

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795482A (en) * 2019-10-16 2020-02-14 浙江大华技术股份有限公司 Data benchmarking method, device and storage device
WO2020119176A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Reimbursement data checking method, identification server, and storage medium
CN111428044A (en) * 2020-03-06 2020-07-17 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for obtaining supervision identification result in multiple modes
CN111444320A (en) * 2020-06-16 2020-07-24 太平金融科技服务(上海)有限公司 Text retrieval method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119176A1 (en) * 2018-12-13 2020-06-18 平安医疗健康管理股份有限公司 Reimbursement data checking method, identification server, and storage medium
CN110795482A (en) * 2019-10-16 2020-02-14 浙江大华技术股份有限公司 Data benchmarking method, device and storage device
CN111428044A (en) * 2020-03-06 2020-07-17 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for obtaining supervision identification result in multiple modes
CN111444320A (en) * 2020-06-16 2020-07-24 太平金融科技服务(上海)有限公司 Text retrieval method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657114A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for generating disease name code matching list

Also Published As

Publication number Publication date
CN112035616B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN112347310A (en) Event processing information query method and device, computer equipment and storage medium
CN111814466A (en) Information extraction method based on machine reading understanding and related equipment thereof
CN111552799B (en) Information processing method, information processing device, electronic equipment and storage medium
KR102491172B1 (en) Natural language question-answering system and learning method
CN111814482B (en) Text key data extraction method and system and computer equipment
CN111159387A (en) Recommendation method based on multi-dimensional alarm information text similarity analysis
CN112085091B (en) Short text matching method, device, equipment and storage medium based on artificial intelligence
CN111859984B (en) Intention mining method, device, equipment and storage medium
CN112036172B (en) Entity identification method and device based on abbreviated data of model and computer equipment
CN112417887A (en) Sensitive word and sentence recognition model processing method and related equipment thereof
CN111552798B (en) Name information processing method and device based on name prediction model and electronic equipment
CN115309915A (en) Knowledge graph construction method, device, equipment and storage medium
CN116186658A (en) User identity verification data processing system
CN112347254A (en) News text classification method and device, computer equipment and storage medium
CN112035616B (en) BERT model and rule-based medical insurance data code matching method, device and equipment
CN112668324B (en) Corpus data processing method and device, electronic equipment and storage medium
CN112364136A (en) Keyword generation method, device, equipment and storage medium
CN112149424A (en) Semantic matching method and device, computer equipment and storage medium
CN112685389B (en) Data management method, data management device, electronic device, and storage medium
CN113312481A (en) Text classification method, device and equipment based on block chain and storage medium
Dyvak et al. System for web resources content structuring and recognizing with the machine learning elements
CN114840872A (en) Secret text desensitization method and device, computer equipment and readable storage medium
CN113641808A (en) Slot information-based answering method, device, equipment and storage medium
CN112528662A (en) Entity category identification method, device, equipment and storage medium based on meta-learning
CN112434082A (en) Operation and maintenance resource management method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220526

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Block H, 666 Beijing East Road, Huangpu District, Shanghai 200000

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant