CN111680495A - Data error correction method, device and system - Google Patents

Data error correction method, device and system Download PDF

Info

Publication number
CN111680495A
CN111680495A CN202010531752.6A CN202010531752A CN111680495A CN 111680495 A CN111680495 A CN 111680495A CN 202010531752 A CN202010531752 A CN 202010531752A CN 111680495 A CN111680495 A CN 111680495A
Authority
CN
China
Prior art keywords
error correction
vector
data
brief introduction
profile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010531752.6A
Other languages
Chinese (zh)
Inventor
毛长汇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyun Digital Technology Co ltd
Original Assignee
Beijing Qiyun Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyun Digital Technology Co ltd filed Critical Beijing Qiyun Digital Technology Co ltd
Priority to CN202010531752.6A priority Critical patent/CN111680495A/en
Publication of CN111680495A publication Critical patent/CN111680495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/226Validation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to the technical field of data processing, in particular provides a data error correction method, a device and a system, and aims to solve the technical problem of accurately and efficiently correcting data of massive and complicated internet data. To this end, the data error correction method according to an embodiment of the present invention is to perform efficient and accurate data error correction on massive and complicated internet data, such as object attribute information of a target object obtained from the internet, based on a preset Drools rule engine and/or a preset LSTM neural network model algorithm. Based on the advantages of the Drools rule engine such as easy adjustment and easy management, the present embodiment can flexibly set the error correction rule and adjust the data in the error correction rule at any time, thereby satisfying different data error correction requirements of different users. The data error correction is carried out based on the preset LSTM neural network model algorithm, so that not only can a manual auditing mode be simulated, but also the defects of long time consumption and low efficiency existing in the manual auditing mode are overcome.

Description

Data error correction method, device and system
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data error correction method, apparatus, and system.
Background
With the development of internet technology, many users choose to use data information of target objects, such as clients, obtained from the internet, and then perform data cleaning (for example, removing pictures, advertisements, etc. that are irrelevant to data content in the data information) and storage processes on the data information. Because the data sources of the internet data are messy, the internet data usually have more data errors, conflicts, contradictions and other problems, and the problems cannot be solved by a conventional data cleaning method and can only be corrected by adopting a manual auditing mode. However, the manual auditing method is not suitable for data error correction of internet data with large data volume due to low efficiency, long time consumption and other factors.
Accordingly, there is a need in the art for a new data error correction scheme to address the above-mentioned problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention is proposed to provide a data error correction method, device and system that solve or at least partially solve the technical problem of how to accurately and efficiently perform data error correction on massive and complicated internet data.
In a first aspect, a method for correcting data errors is provided, the method comprising:
acquiring object attribute information of a target object, wherein the object attribute information comprises object characteristics and an object profile of the target object;
calling an error correction rule in a preset Drools rule engine, correcting the object characteristics according to the error correction rule and outputting a first error correction result;
and/or performing text analysis on the object brief introduction of the target object based on a preset LSTM neural network model algorithm, respectively acquiring brief introduction characteristics corresponding to each object characteristic in the object brief introduction according to a text analysis result, and outputting a second error correction result according to a comparison result of each object characteristic and the brief introduction characteristics corresponding to each object characteristic;
and the first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors.
In an aspect of the data error correction method, the method further includes:
acquiring feedback information of a current error correction result;
starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information;
wherein the feedback information comprises an agreement modification error message and a refusal modification error message, and the user processing request pre-associated with the agreement modification error message comprises an object characteristic of a modification target object; if the current error correction result is a first error correction result, modifying the error correction rule in the preset Drools rule engine by the user processing request pre-associated with the error information modification refusal; and if the current error correction result is the second error correction result, the user processing request pre-associated with the modification-refusing error information comprises the object characteristics of the target object not to be modified.
In one embodiment of the foregoing data error correction method, "performing text analysis on an object profile of the target object based on a preset LSTM neural network model algorithm, respectively obtaining profile features corresponding to each object feature in the object profile according to a text analysis result, and outputting a second error correction result according to a comparison result between each object feature and the profile feature corresponding to each object feature" specifically includes:
performing word segmentation on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
acquiring an object feature vector of the object feature to be corrected currently;
based on a preset LSTM neural network model and according to the object profile vector, obtaining a profile feature vector of profile features corresponding to the current object features to be corrected in the object profile, and judging whether the profile feature vector is consistent with the object feature vector; and if the error correction values are not consistent, outputting a second error correction result.
In one technical solution of the above data error correction method, the model training method of the preset LSTM neural network model includes:
acquiring object attribute information of an object sample in a preset training set, wherein the object attribute information comprises object characteristics and an object brief introduction of the object sample;
performing word segmentation on the object brief introduction of the object sample, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
obtaining an object feature vector corresponding to each object feature of the object sample;
vector splicing is carried out on the object introduction vector and each object characteristic vector respectively to obtain a full text vector corresponding to each object characteristic vector;
and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm.
In a second aspect, there is provided a data error correction system, the system comprising:
a data acquisition device configured to acquire object attribute information of a target object, the object attribute information including an object feature and an object profile of the target object;
the data error correction device is configured to call an error correction rule in a preset Drools rule engine, correct the error of the object characteristic according to the error correction rule and output a first error correction result; and/or performing text analysis on the object brief introduction of the target object based on a preset LSTM neural network model algorithm, respectively acquiring brief introduction characteristics corresponding to each object characteristic in the object brief introduction according to a text analysis result, and outputting a second error correction result according to a comparison result of each object characteristic and the brief introduction characteristics corresponding to each object characteristic;
and the first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors.
In an aspect of the above data error correction system, the system further includes a data error correction processing apparatus configured to perform the following operations:
acquiring feedback information of a current error correction result;
starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information;
wherein the feedback information comprises an agreement modification error message and a refusal modification error message, and the user processing request pre-associated with the agreement modification error message comprises an object characteristic of a modification target object; if the current error correction result is a first error correction result, modifying the error correction rule in the preset Drools rule engine by the user processing request pre-associated with the error information modification refusal; and if the current error correction result is the second error correction result, the user processing request pre-associated with the modification-refusing error information comprises the object characteristics of the target object not to be modified.
In an aspect of the data error correction system, the method further includes:
the data error correction apparatus is configured to perform the following operations:
performing word segmentation on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
acquiring an object feature vector of the object feature to be corrected currently;
based on a preset LSTM neural network model and according to the object profile vector, obtaining a profile feature vector of profile features corresponding to the current object features to be corrected in the object profile, and judging whether the profile feature vector is consistent with the object feature vector; and if the error correction values are not consistent, outputting a second error correction result.
In an aspect of the data error correction system, the method further includes:
the data error correction apparatus includes a model training module configured to:
acquiring object attribute information of an object sample in a preset training set, wherein the object attribute information comprises object characteristics and an object brief introduction of the object sample;
performing word segmentation on the object brief introduction of the object sample, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
obtaining an object feature vector corresponding to each object feature of the object sample;
vector splicing is carried out on the object introduction vector and each object characteristic vector respectively to obtain a full text vector corresponding to each object characteristic vector;
and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm.
In a third aspect, there is provided a storage device having stored therein a plurality of program codes adapted to be loaded and run by a processor to perform the data error correction method of any of the above.
In a fourth aspect, there is provided a control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the data error correction method of any of the above.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme of the invention, mass and complicated internet data such as object attribute information of a target object acquired from the internet can be subjected to efficient and accurate data error correction based on a preset Drools rule engine and/or a preset LSTM neural network model algorithm. Specifically, the data error correction based on the preset Drools rule engine includes: and calling an error correction rule in a preset Drools rule engine, correcting the object characteristics in the object attribute information according to the error correction rule, and outputting a first error correction result. The data error correction based on the preset LSTM neural network model algorithm comprises the following steps: and performing text analysis on the object profiles in the object attribute information based on a preset LSTM neural network model algorithm, respectively acquiring profile characteristics corresponding to each object characteristic in the object profiles, and further outputting a second error correction result according to a comparison result of each object characteristic and the profile characteristics corresponding to each object characteristic. The first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors.
Based on the advantages of easiness in adjustment and management and the like of the Drools rule engine, the embodiment of the invention can flexibly set the error correction rule, adjust the data in the error correction rule at any time without modifying the code of the system loading the Drools rule engine, and fundamentally separate the processing logic of data error correction from the system code, thereby meeting different data error correction requirements of different users. The data error correction mode based on the preset LSTM neural network model algorithm can simulate manual review, and overcomes the defects of long time consumption and low efficiency of the manual review. Further, when the object attribute information of the target object is subjected to error correction analysis based on the preset Drools rule engine and the preset LSTM neural network model algorithm, the comprehensiveness and accuracy of data error correction can be greatly improved.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating the main steps of a data error correction method according to an embodiment of the present invention;
FIG. 2 is a block diagram of the main structure of a data error correction system according to one embodiment of the present invention;
list of reference numerals:
11: a data acquisition device; 12: and a data error correction device.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
Because the data sources of the internet data are complicated, the internet data usually have many problems of data errors, conflicts, contradictions and the like. An example is as follows: one piece of text information collected from the internet is 'the native place of doctor A is West An city of Shanxi province', and the province of the West An city belongs to Shaanxi province, so that the text information obviously has data errors. For such data problems, the conventional data error correction method is mainly implemented by means of manual review at present. However, the method is limited by the factors of long time consumption, low efficiency and the like, and the manual auditing method is not suitable for data error correction of internet data with large data volume.
In the embodiment of the invention, the data error correction can be efficiently and accurately performed on massive and complicated internet data such as object attribute information of a target object acquired from the internet based on a preset Drools rule engine and/or a preset LSTM neural network model algorithm. Specifically, the data error correction based on the preset Drools rule engine includes: and calling an error correction rule in a preset Drools rule engine, correcting the object characteristics (such as, if the target object is doctor A, the object characteristics include, but are not limited to, the name, age, academic calendar, contact information, work unit and the like of doctor A) in the object attribute information according to the error correction rule, and outputting a first error correction result. The data error correction based on the preset LSTM neural network model algorithm comprises the following steps: performing text analysis on the object profile in the object attribute information (for example, if the target object is doctor A, the object profile is the doctor A profile of doctor A), respectively acquiring profile characteristics corresponding to each object characteristic in the object profile, and further outputting a second error correction result according to a comparison result of each object characteristic and the profile characteristics corresponding to each object characteristic. The first error correction result and the second error correction result include, but are not limited to: the information error reason, the object attribute information copy with error, the error correction time, the modification suggestion and the like, so that the user can more intuitively view the error information.
The Drools rule engine is an open source business rule engine which is easy to adjust and manage, and based on the Drools rule engine, the error correction rule can be flexibly set, data in the error correction rule can be adjusted at any time without modifying codes of a system loading the Drools rule engine, so that processing logic of data error correction is fundamentally separated from system codes, and different data error correction requirements of different users are met.
The object profile (e.g., the user's profile information) of the target object usually contains a part of the object characteristics (e.g., the user's name and age, etc.) of the target object, in other words, there is a part of information duplication/intersection between the object characteristics of the target object and the object profile. The object profile of the target object is subjected to text analysis by an LSTM neural network model algorithm, profile characteristics (user's age described in the object profile) corresponding to certain object characteristics (e.g., user's age) of the target object in the object profile are obtained based on the text analysis result, and if the object characteristics are consistent with the corresponding profile characteristics, the object characteristics are correct without error correction. If the object characteristics are inconsistent with the corresponding profile characteristics, the object characteristics are possibly wrong, and error correction is needed, so that an error correction result is output after the object characteristics are detected to be inconsistent with the corresponding profile characteristics, and a user is reminded to modify the object characteristics. The data error correction mode based on the preset LSTM neural network model algorithm can simulate manual review and overcome the defects of long time consumption and low efficiency of the manual review. Further, in the embodiment of the present invention, if the object attribute information of the target object is subjected to error correction analysis based on the preset Drools rule engine and the preset LSTM neural network model algorithm, the accuracy of data error correction is greatly improved.
In an application scenario of the present invention, a Customer Relationship Management (CRM) of a pharmaceutical enterprise stores a large amount of data of doctor data collected from the internet. The doctor data mainly comprises doctor master data and doctor associated data, the doctor master data mainly comprises data such as the name, age, sex, work unit and doctor profile of a doctor, and the doctor associated data mainly comprises data such as documents published by the doctor and conference information attended by the doctor. In which, the doctor master data usually has problems of data error, conflict, contradiction, etc. An example is as follows: the doctor main data comprises 'the native place of doctor A is West An city of Shanxi province', the province of the West An city is Shaanxi province, and therefore data errors obviously exist in the text information. In this case, the doctor master data in the customer relationship management system may be corrected according to the data correction method of one embodiment of the present invention. The doctor master data of a doctor is the object attribute information of the doctor, the doctor profile in the doctor master data is the object profile in the object attribute information, and the other information except the doctor profile in the doctor master data is the object feature in the object attribute information.
An example is as follows: the doctor master data of a certain doctor includes:
saving: beijing
Market: beijing
The state is as follows: china (China)
Hospital: beijing coordination hospital
Department: pediatric surgery
Name: zhang three
Age group: 40-50
Sex: for male
Title of job: chief and ren physicians
Introduction of doctors: zhang III, Men, the Assistant physicians, the doctor of surgery, the doctor of China, the Committee of the pediatric surgery Committee of the pediatric surgical division of the medical society of Shanghai, the Committee of the pediatric surgery division of the medical society of Shanghai, and the Committee of the youth Committee of the Children tumor professional Committee of the Chinese anticancer Association. Once visited learning at the university of stanford medical school in the united states. Host and participate in a plurality of national subjects, issue 10 related papers, obtain the second prize of the third and fourth national exchanges of pediatric oncology surgery, the third two-bank society of children's oncology and research, the excellent paper of youth doctors, the third prize, the fifth three-year excellent paper of Chinese oncology colleges. The diagnosis and treatment of the frequently encountered diseases of children and the comprehensive treatment of the solid tumors of children are familiar.
After the data error correction method according to an embodiment of the present invention performs error correction analysis on the above-mentioned doctor's main data, it is found that the profile characteristics "assistant principal physician" about the job title in the doctor's profile are inconsistent with the doctor's job title characteristics "principal physician" in the subject characteristics, which indicates that the doctor's job title characteristics may be wrong and error correction is required, and then outputs an error correction result, where the reason for the information error in the error correction result may be "job title is inconsistent with subject profile".
Referring to fig. 1, fig. 1 is a flow chart illustrating the main steps of a data error correction method according to an embodiment of the present invention. As shown in fig. 1, the data error correction method in the embodiment of the present invention may include the following steps:
step S101: object attribute information of the target object is acquired.
The target object refers to a target for performing data error correction, that is, data error correction is performed on data related to the target. The object attribute information of the target object refers to data that can represent some feature/characteristic of the target object. An example is as follows: if the target object is doctor A, then doctor A's object attribute information includes, but is not limited to: doctor a's name, age, calendar, contact, work unit, and brief introduction, etc.
In one embodiment, object attribute information of a target object input by an external system may be acquired in real time. In the present embodiment, an acquisition module dedicated to communication with an external system to acquire object attribute information stored in the external system may be provided in an apparatus for performing a data error correction method. An example is as follows: the "acquiring module dedicated to communicate with the external system to acquire the object attribute information stored in the external system" may be a module constructed based on an Application Programming Interface (API), and the external system may be configured to call the API Interface so that the "apparatus for executing the data error correction method" can acquire the object attribute information output by the external system through the API Interface. The interface request mode of the API is POST type, so that the external system can output object attribute information to the pre-specified address (such as https:// gateway. data correction. cn/sector-info/input) of the device for executing the data error correction method.
In one embodiment, object attribute information of a target object stored in an external system may be acquired at regular time. In this embodiment, the object attribute information stored in the database in the external system can be acquired in a timely and batch manner by directly accessing the database. By the data acquisition mode, comprehensive error correction analysis can be performed on all object attribute information stored in the database, and the problem of missed detection is prevented.
Step S102: and correcting the object attribute information of the target object based on a preset Drools rule engine and/or a preset LSTM neural network model algorithm and outputting a correction result. The specific steps of data error correction based on the preset Drools rule engine and the preset LSTM neural network model algorithm are described below.
1. And performing data error correction based on a preset Drools rule engine.
The rule engine is a component embedded in an application program, and realizes the separation of the business rules from the application program codes, so that the complex business rules are realized simply, and the business rules can be dynamically modified, thereby quickly responding to the requirement change. The Drools rules engine is an open source rules engine written in Java language and using RETE algorithm, which is the algorithm described in the paper published in 1974 by the university of tomilong, Charles l. The Drools rule engine has the advantages of easy adjustment, easy management and the like, and can flexibly set error correction rules based on the Drools rule engine to adjust data in the error correction rules at any time without modifying codes of a system loading the Drools rule engine, so that processing logic of data error correction is fundamentally separated from system codes, and different data error correction requirements of different users are met
Specifically, in the present embodiment, an error correction rule in a preset Drools rule engine may be called, and the object feature of the target object may be corrected according to the called error correction rule and a first error correction result may be output. An example is as follows: if the object property information of the target object is doctor's master data stored in the customer relationship management system of a certain pharmaceutical enterprise, the error correction rules in the Drools rules engine may include, but are not limited to, the following rules:
error correction rule 1: province and city disagreement
Error correction rule 2: province and country disagreement
Error correction rule 3: hospital name length is greater than 10 characters
Error correction rule 4: doctor's surname is null
Error correction rule 5: doctor's name length is more than 20 characters
Error correction rule 6: the doctor age bracket is less than 20-30
Error correction rule 7: 110-over 100-over-the-year-old doctor
Error correction rule 8: mobile phone number validity check
Error correction rule 9: working life greater than 80
Error correction rule 10: the working age is greater than the age of the doctor
The doctor master data of the doctor a is corrected by calling the above-mentioned correction rules 1 to 10 one by one. If the error correction rule 1 is used for correcting the ' the native place of the doctor A is Shanxi province, West An City ', the fact that the native place characteristic of the doctor A has data errors can be obtained, and then an error correction result can be output, wherein the error correction result can comprise the information error reason and an object attribute information copy with errors (copy information of the native place of the doctor A is Shanxi province, West An City '), and the information error reason can be ' the province does not accord with the city '.
2. And carrying out data error correction based on a preset LSTM neural network model algorithm.
In this embodiment, the object attribute information of the target object may be corrected based on a preset LSTM neural network model algorithm according to the following steps:
and performing text analysis on the object brief introduction of the target object based on a preset LSTM neural network model algorithm, respectively acquiring brief introduction characteristics corresponding to each object characteristic in the object brief introduction according to the text analysis result, and outputting a second error correction result according to the comparison result of each object characteristic and the brief introduction characteristics corresponding to each object characteristic. If the current object characteristic is consistent with the profile characteristic corresponding to the current object characteristic, the current object characteristic is correct, and error correction is not needed. If the current object characteristic and the profile characteristic corresponding to the current object characteristic are inconsistent, the current object characteristic is possible to be wrong, and error correction is needed, so that an error correction result can be output to remind a user to check error information and/or modify the error information.
In one embodiment, the object attribute information of the target object may be corrected based on a preset LSTM neural network model algorithm according to the following steps:
step 11: and acquiring object attribute information of the object samples in the preset training set.
The object attribute information includes object characteristics and an object profile of the object sample. The object characteristics and the object profile are the same as those in step S101, and are not described herein again for brevity.
Step 12: and (3) performing word segmentation on the object brief introduction of the object sample obtained in the step (11), obtaining a word vector corresponding to each word in the object brief introduction according to the word segmentation processing result, and obtaining the object brief introduction vector of the object brief introduction according to the word vector.
An example is as follows: the subject sample was doctor a, whose subject profile was as follows:
a, male, chief and ren physicians, professor, assistant and chief nephrology department, assistant and chief department of medicine, mentor of Master research, Jiangsu Wuxi, 1993 at Beijing medical university, after graduate, work in Beijing at a hospital, and to the resident and the general hospitalization physician. The doctor of medical science obtained from university of China cooperative medical science in 2000. Work was continued in Beijing in coordination with the nephrology department of the hospital, and the physician was attending the nephrology department. 2002.6-2003.6 served as visitors to the Royal free Hospital nephrology department, university school of London, England. In the end of 2003, the traditional Chinese medicine plays a role in Beijing coordination and the subsidiary major role of nephrology department in hospitals. In 2004, the physician and the secondary professor were hired by Beijing collaborating with the hospital. In 2005 was recruited by the university of medical and china as a mentor to the research of major. The department of health reviewed competence with the lead physician in 2008. Mainly engaged in the clinical diagnosis and treatment of various primary and secondary kidney diseases. It is good at treating intractable nephrotic syndrome, IgA nephropathy, lupus nephritis, etc. In the research aspect, the research is mainly undertaken on the aspects of renal lipid metabolism and renal pathology. A total of 40 papers have been published in foreign and domestic core publications by the first author or correspondent since 2000, of which more than 20 were reviewed.
First, the object profile is divided into a plurality of sentences according to punctuation marks in the object profile, which is as follows.
< a >, < man >, < master physician >, < professor >, < subsidiary renal medical master >, < subsidiary medical department >, < master research student instructor >, < jiasu wu xiren >, <1993 when graduate in beijing medical university >, < work in beijing collaborating with hospital after graduation >, < any inpatient >, <2000 when graduate in china college of university of london >, < doctor position of doctor of medical acquisition >, < work in continuation in beijing collaborating with hospital nephrology >, < any physician of nephrology >, < 2002.6-2003.6 as visitor in royal free hospital nephrology of college of university of british >, <2003 when carrying Beijing collaborate with hospital nephrology at the end of Beijing >, <2004 were engaged as subsidiary physicians by the beijing collaborator hospital, and <2000 > was published in a first or in both foreign country and 40 supplementary domestic papers of international publication, < the rest of its works 20 >.
Then, performing word segmentation processing on each sentence to obtain the following word segmentation processing results:
< a >, < male >, < master physician >, < professor >, < nephrology >, < minor master >, < science >, < lineage >, < minor master >, < master researcher >, < Jiangsu >, < Wuxi >, <1993 >, < graduation >, < Beijing medical university >, < graduation >, < post >, < during >, < Beijing collaborating with the hospital >, < physician >, < work >, < ren >, < hospitalization >, <2000 >, < graduation >, < China collaborating with the medical science, < continuation >, < Beijing collaborating with the hospital, < nephrology >, < work >, < kidney >, < physician >, < study >, < splendid with the university, < Britain >, < university, < CoMP, and Hospital, < Council >, <2002.6>, <2003.6>, < royal institute >, < university institute of university, free hospital >, < nephrology >, < work >, <2003 >, < bottom >, < person >, < Beijing collaborating with the hospital >, < nephrology >, < minor master >, <2004 >, < quilt >, < Beijing collaborating with the hospital >, < employment >, < minor master physician >, <2000 >, < since >, < before >, < first author >, < or >, < correspondent author >, < before >, < foreign >, < and >, < domestic >, < core publication >, < upper >, < published paper >, < co >, <40>, < rest >, < among >, < work >, <20>, < rest >, < work >, < rest >, <
And finally, respectively obtaining word vectors corresponding to each word, and carrying out vector splicing on the word vectors to obtain an object brief introduction vector of the object brief introduction.
Step 13: and acquiring an object feature vector corresponding to each object feature of the object sample. In this embodiment, semantic information of object features may be obtained first based on a conventional natural language identification method in the technical field of natural language processing, and then corresponding object feature vectors may be obtained according to the semantic information.
Step 14: and respectively carrying out vector splicing on the object introduction vector and each object characteristic vector to obtain a full-text vector corresponding to each object characteristic vector.
Step 15: and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm.
Step 16: performing error correction analysis on the object attribute information of the target object according to the LSTM (Long Short-Term Memory) neural network model after model training, and specifically comprising the following steps:
step 161: and performing word segmentation processing on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to the word segmentation processing result, and acquiring the object brief introduction vector of the object brief introduction according to the word vector.
Step 162: and acquiring an object feature vector of the object feature to be corrected currently.
Step 163: acquiring a profile characteristic vector of profile characteristics corresponding to the characteristics of the current object to be corrected in the object profile based on a preset LSTM neural network model and according to the object profile vector, and judging whether the profile characteristic vector is consistent with the object characteristic vector or not; and if the error correction values are not consistent, outputting a second error correction result.
Specifically, firstly, the object profile vector is vector-spliced with each object feature vector to obtain a full-text vector. And then, sequentially inputting each full-text vector into a preset LSTM neural network model for error correction analysis. The LSTM neural network model can obtain an object profile vector in the object profile corresponding to the object feature vector in the currently input full-text vector, and further determine whether the profile feature vector is consistent with the object feature vector.
It should be noted that the above steps 11 to 16 are a training and using process of a complete LSTM neural network model. In some embodiments, if the LSTM neural network model has been trained, steps 11-15 may be omitted after step S101, and step 16 may be performed directly instead.
Further, the data error correction method according to an embodiment of the present invention may include the following steps S103 to S104 after performing step S102.
Step S103: feedback information of the error correction result obtained through step S102 is acquired.
The feedback information of the error correction result comprises the information of approving to modify the error and the information of refusing to modify the error, the information of approving to modify the error is the error of approving to modify the prompt of the current error correction result, and the information of refusing to modify the error is the error of refusing to modify the prompt of the current error correction result.
Step S104: and starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information.
The user processing request pre-associated with the consent to modify the error information may include modifying an object characteristic of the target object. An example is as follows: and if the error information prompted by the error correction result is that the doctor job title characteristics are inconsistent with the job title information in the doctor profile, starting a data processing end of the doctor job title characteristics so that a user can modify the doctor job title characteristics in the current data processing end.
If the error correction result is the first error correction result, the user processing request pre-associated with the rejection of modifying the error information may include modifying an error correction rule in a preset Drools rule engine. An example is as follows: the error information prompted by the error correction result obtained after the object attribute information of the target object is corrected based on the preset Drools rule engine is as follows: the working life is less than 80. However, since it is reasonable that "the error information is less than 80 in operating life", it is determined that the error information is apparently erroneous. After receiving the error information which is refused to be modified of the current error correction result, starting the data processing terminal of the error correction rule with the working life being more than 80 so as to facilitate the user to modify the error correction rule. For example: the error correction rules are modified to "working life less than 80".
If the error correction result is the second error correction result, the user processing request pre-associated with the modification rejection error information may include not modifying the object feature of the target object, that is, after receiving the modification rejection error information, keeping the current object feature unchanged without any modification.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
Referring to fig. 2, fig. 2 is a main structural block diagram of a data error correction system according to an embodiment of the present invention. As shown in fig. 2, the data error correction system in the embodiment of the present invention mainly includes a data acquisition device 11 and a data error correction device 12. The data obtaining means 11 may be configured to obtain object property information of the target object, the object property information including an object characteristic and an object profile of the target object. The data error correction device 12 may be configured to invoke an error correction rule in a preset Drools rule engine, correct an error of the object feature according to the error correction rule, and output a first error correction result; and/or performing text analysis on the object profile of the target object based on a preset LSTM neural network model algorithm, respectively acquiring profile characteristics corresponding to each object characteristic in the object profile, and outputting a second error correction result according to the comparison result of each object characteristic and the profile characteristics corresponding to each object characteristic. The first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors. In one embodiment, the description of the specific implementation function may be referred to in steps S101 to S102.
In one embodiment, the data error correction system shown in fig. 2 may include a data error correction processing apparatus, which may be configured to perform the following operations:
acquiring feedback information of a current error correction result; and starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information. The feedback information comprises the error information of the approving modification and the error information of the refusing modification, and the user processing request which is pre-associated with the error information of the approving modification comprises the object characteristics of the modification target object; if the current error correction result is the first error correction result, modifying the error correction rule in a preset Drools rule engine by the user processing request which is pre-associated with the error information modification refusal; and if the current error correction result is the second error correction result, the user processing request pre-associated with the error information modification refusal comprises the object characteristics of the target object not to be modified. In one embodiment, the description of the specific implementation function may be referred to in steps S103 to S104.
In one embodiment, the data error correction device 12 may be configured to perform the following operations: performing word segmentation on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector; acquiring an object feature vector of the object feature to be corrected currently; acquiring a profile characteristic vector of profile characteristics corresponding to the characteristics of the current object to be corrected in the object profile based on a preset LSTM neural network model and according to the object profile vector, and judging whether the profile characteristic vector is consistent with the object characteristic vector or not; and if the error correction values are not consistent, outputting a second error correction result. In one embodiment, the description of the specific implementation function may be referred to in step S102.
In one embodiment, the data error correction device 12 may include a model training module configured to perform the following operations:
acquiring object attribute information of an object sample in a preset training set, wherein the object attribute information comprises object characteristics and an object brief introduction of the object sample; performing word segmentation on the object brief introduction of the object sample, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector; obtaining an object feature vector corresponding to each object feature of the object sample; respectively carrying out vector splicing on the object introduction vector and each object characteristic vector to obtain a full-text vector corresponding to each object characteristic vector; and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm. In one embodiment, the description of the specific implementation function may be referred to in step S102.
The technical principles, the solved technical problems and the generated technical effects of the data error correction system described above for implementing the data error correction method embodiment shown in fig. 1 are similar, and it can be clearly understood by those skilled in the art that, for convenience and conciseness of description, the specific working process and related descriptions of the data error correction system may refer to the contents described in the data error correction method embodiment, and no further description is given here.
Furthermore, the invention also provides a storage device. In this embodiment of the storage device, the storage device may be configured to store a program for executing the data error correction method of the above-described method embodiment, which may be loaded and executed by a processor to implement the above-described data error correction method. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The storage device may be a storage device apparatus formed by including various electronic devices, and optionally, a non-transitory computer-readable storage medium is stored in the embodiment of the present invention.
Furthermore, the invention also provides a control device. In this control device embodiment, the control device includes a processor and a storage device, the storage device may be configured to store a program for executing the data error correction method of the above-described method embodiment, and the processor may be configured to execute the program in the storage device, the program including but not limited to the program for executing the data error correction method of the above-described method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The control device may be a control device apparatus formed by including various electronic devices, and optionally, the control device in the embodiment of the present invention is a server.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A method for error correction of data, the method comprising:
acquiring object attribute information of a target object, wherein the object attribute information comprises object characteristics and an object profile of the target object;
calling an error correction rule in a preset Drools rule engine, correcting the object characteristics according to the error correction rule and outputting a first error correction result; and/or the presence of a gas in the gas,
performing text analysis on the object brief introduction of the target object based on a preset LSTM neural network model algorithm, respectively acquiring brief introduction characteristics corresponding to each object characteristic in the object brief introduction according to a text analysis result, and outputting a second error correction result according to a comparison result of each object characteristic and the brief introduction characteristics corresponding to each object characteristic;
and the first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors.
2. The data error correction method of claim 1, wherein the method further comprises:
acquiring feedback information of a current error correction result;
starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information;
wherein the feedback information comprises an agreement modification error message and a refusal modification error message, and the user processing request pre-associated with the agreement modification error message comprises an object characteristic of a modification target object; if the current error correction result is a first error correction result, modifying the error correction rule in the preset Drools rule engine by the user processing request pre-associated with the error information modification refusal; and if the current error correction result is the second error correction result, the user processing request pre-associated with the modification-refusing error information comprises the object characteristics of the target object not to be modified.
3. The data error correction method of claim 1, wherein the steps of performing a text analysis on the object profile of the target object based on a preset LSTM neural network model algorithm, respectively obtaining profile features of the object profile corresponding to each object feature according to the text analysis result, and outputting the second error correction result according to the comparison result between each object feature and the profile feature corresponding to each object feature specifically comprise:
performing word segmentation on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
acquiring an object feature vector of the object feature to be corrected currently;
based on a preset LSTM neural network model and according to the object profile vector, obtaining a profile feature vector of profile features corresponding to the current object features to be corrected in the object profile, and judging whether the profile feature vector is consistent with the object feature vector; and if the error correction values are not consistent, outputting a second error correction result.
4. The data error correction method of claim 3, wherein the model training method of the preset LSTM neural network model comprises:
acquiring object attribute information of an object sample in a preset training set, wherein the object attribute information comprises object characteristics and an object brief introduction of the object sample;
performing word segmentation on the object brief introduction of the object sample, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
obtaining an object feature vector corresponding to each object feature of the object sample;
vector splicing is carried out on the object introduction vector and each object characteristic vector respectively to obtain a full text vector corresponding to each object characteristic vector;
and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm.
5. A data error correction system, the system comprising:
a data acquisition device configured to acquire object attribute information of a target object, the object attribute information including an object feature and an object profile of the target object;
the data error correction device is configured to call an error correction rule in a preset Drools rule engine, correct the error of the object characteristic according to the error correction rule and output a first error correction result; and/or performing text analysis on the object brief introduction of the target object based on a preset LSTM neural network model algorithm, respectively acquiring brief introduction characteristics corresponding to each object characteristic in the object brief introduction according to a text analysis result, and outputting a second error correction result according to a comparison result of each object characteristic and the brief introduction characteristics corresponding to each object characteristic;
and the first error correction result and the second error correction result both comprise information error reasons and object attribute information copies with errors.
6. The data error correction system of claim 5, wherein the system further comprises a data error correction processing device configured to perform the following operations:
acquiring feedback information of a current error correction result;
starting a corresponding data processing terminal according to the feedback information so as to respond to a user processing request pre-associated with the feedback information;
wherein the feedback information comprises an agreement modification error message and a refusal modification error message, and the user processing request pre-associated with the agreement modification error message comprises an object characteristic of a modification target object; if the current error correction result is a first error correction result, modifying the error correction rule in the preset Drools rule engine by the user processing request pre-associated with the error information modification refusal; and if the current error correction result is the second error correction result, the user processing request pre-associated with the modification-refusing error information comprises the object characteristics of the target object not to be modified.
7. The data error correction system of claim 5, further comprising:
the data error correction apparatus is configured to perform the following operations:
performing word segmentation on the object brief introduction of the target object, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
acquiring an object feature vector of the object feature to be corrected currently;
based on a preset LSTM neural network model and according to the object profile vector, obtaining a profile feature vector of profile features corresponding to the current object features to be corrected in the object profile, and judging whether the profile feature vector is consistent with the object feature vector; and if the error correction values are not consistent, outputting a second error correction result.
8. The data error correction system of claim 7, further comprising:
the data error correction apparatus includes a model training module configured to:
acquiring object attribute information of an object sample in a preset training set, wherein the object attribute information comprises object characteristics and an object brief introduction of the object sample;
performing word segmentation on the object brief introduction of the object sample, acquiring a word vector corresponding to each word in the object brief introduction according to a word segmentation processing result, and acquiring an object brief introduction vector of the object brief introduction according to the word vector;
obtaining an object feature vector corresponding to each object feature of the object sample;
vector splicing is carried out on the object introduction vector and each object characteristic vector respectively to obtain a full text vector corresponding to each object characteristic vector;
and performing model training on the pre-constructed LSTM neural network model based on each full-text vector by utilizing a machine learning algorithm.
9. A storage device having a plurality of program codes stored therein, wherein the program codes are adapted to be loaded and run by a processor to perform the data error correction method of any of claims 1 to 4.
10. A control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the data error correction method of any one of claims 1 to 4.
CN202010531752.6A 2020-06-11 2020-06-11 Data error correction method, device and system Pending CN111680495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010531752.6A CN111680495A (en) 2020-06-11 2020-06-11 Data error correction method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010531752.6A CN111680495A (en) 2020-06-11 2020-06-11 Data error correction method, device and system

Publications (1)

Publication Number Publication Date
CN111680495A true CN111680495A (en) 2020-09-18

Family

ID=72435538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010531752.6A Pending CN111680495A (en) 2020-06-11 2020-06-11 Data error correction method, device and system

Country Status (1)

Country Link
CN (1) CN111680495A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012234240A (en) * 2011-04-28 2012-11-29 Buffalo Inc Storage device, computer device, control method of computer, and computer program
CN106550268A (en) * 2016-12-26 2017-03-29 Tcl集团股份有限公司 Method for processing video frequency and video process apparatus
CN110347709A (en) * 2019-06-28 2019-10-18 苏宁云计算有限公司 A kind of construction method and system of regulation engine
CN110457680A (en) * 2019-07-02 2019-11-15 平安科技(深圳)有限公司 Entity disambiguation method, device, computer equipment and storage medium
CN110580489A (en) * 2018-06-11 2019-12-17 阿里巴巴集团控股有限公司 Data object classification system, method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012234240A (en) * 2011-04-28 2012-11-29 Buffalo Inc Storage device, computer device, control method of computer, and computer program
CN106550268A (en) * 2016-12-26 2017-03-29 Tcl集团股份有限公司 Method for processing video frequency and video process apparatus
CN110580489A (en) * 2018-06-11 2019-12-17 阿里巴巴集团控股有限公司 Data object classification system, method and equipment
CN110347709A (en) * 2019-06-28 2019-10-18 苏宁云计算有限公司 A kind of construction method and system of regulation engine
CN110457680A (en) * 2019-07-02 2019-11-15 平安科技(深圳)有限公司 Entity disambiguation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Stupple et al. The reproducibility crisis in the age of digital medicine
Orchard et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium
Schwarz et al. Does the importance of parent and peer relationships for adolescents’ life satisfaction vary across cultures?
Fields et al. Effects of a meaningful, a discriminative, and a meaningless stimulus on equivalence class formation
Nembaware et al. A framework for tiered informed consent for health genomic research in Africa
Harteloh The implementation of an automated coding system for cause-of-death statistics
Mohammed Moving beyond the “exotic”: applying postcolonial theory in health research
Bowern Historical linguistics in Australia: trees, networks and their implications
Tuliao et al. Intent to seek counseling among Filipinos: Examining loss of face and gender
CN110119473A (en) A kind of construction method and device of file destination knowledge mapping
WO2021179708A1 (en) Named-entity recognition method and apparatus, computer device and readable storage medium
Lavis et al. Do Canadian civil servants care about the health of populations?
Wilson et al. Social media as a recruitment strategy: using Twitter to explore young people’s mental health
Xiao et al. Social ties, spatial migration paradigm, and mental health among two generations of migrants in China
Wang et al. Factors influencing attitudes toward cyber-counseling among China's Generation Z: A structural equation model
Ballantyne et al. Health data research in New Zealand: updating the ethical governance framework
CN111680495A (en) Data error correction method, device and system
Harrison Jr et al. Training in pathology informatics: implementation at the University of Pittsburgh
Morresey How to write a clinical case report
Tuliao et al. Psychometric properties of the English and Filipino version of the inventory of attitudes towards seeking mental help services
Suh et al. Validation of Korean version of the hardiness resilience gauge
Battista et al. Consortial geospatial data collection: toward standards and processes for shared GeoBlacklight metadata
Van Wart et al. An emerging field: An evaluation of biomedical graduate student and postdoctoral education and training research across seven decades
CN113449503A (en) Session type information management method, device, equipment and storage medium thereof
Mallapaty In the name of reproducibility

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination