CN114064893A - Abnormal data auditing method, device, equipment and storage medium - Google Patents

Abnormal data auditing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114064893A
CN114064893A CN202111347270.6A CN202111347270A CN114064893A CN 114064893 A CN114064893 A CN 114064893A CN 202111347270 A CN202111347270 A CN 202111347270A CN 114064893 A CN114064893 A CN 114064893A
Authority
CN
China
Prior art keywords
abnormal data
text
auditing
matching
audit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111347270.6A
Other languages
Chinese (zh)
Inventor
吴思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202111347270.6A priority Critical patent/CN114064893A/en
Publication of CN114064893A publication Critical patent/CN114064893A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The application relates to the field of artificial intelligence and digital medical treatment, is applied to the field of intelligent medical treatment, and discloses an abnormal data auditing method, device, equipment and storage medium, wherein the method comprises the following steps: acquiring abnormal data; classifying the abnormal data to obtain a category corresponding to the abnormal data; acquiring corresponding audit content from a preset database based on the category; extracting the characteristics of the abnormal data through a keyword extraction model to obtain a characteristic text of the abnormal data; matching the audit content with the characteristic text by using a text matching model to obtain a corresponding matching degree; and comparing each matching degree with a preset threshold value to determine an auditing result. The application also relates to blockchain techniques, where exception data is stored in blockchains. The method and the device can improve the efficiency of auditing the abnormal data.

Description

Abnormal data auditing method, device, equipment and storage medium
Technical Field
The application relates to the field of artificial intelligence and digital medical treatment, in particular to an abnormal data auditing method, device, equipment and storage medium.
Background
In the current social background, more and more people pay attention to the purchase of insurance, the application amount of claim cases is increasing with the increase of the purchase amount of insurance, and the checking workload of abnormal data such as the claim cases is increasing. In the prior art, the auditor is combined with the preservation history record, the underwriting history record, the reason for occurrence of the claim and the like to audit the abnormal data, and the efficiency is low. In other review modes, such as the current claim settlement system, the processing efficiency is low because the time judgment and the personnel review are mainly performed, no further review is performed, and the subsequent review is mainly performed manually. Therefore, how to improve the auditing efficiency of abnormal data becomes an urgent problem to be solved.
Disclosure of Invention
The application provides an abnormal data auditing method, device, equipment and storage medium, which aim to solve the problem of low auditing efficiency of abnormal data in the prior art.
In order to solve the above problem, the present application provides an abnormal data auditing method, including:
acquiring abnormal data;
classifying the abnormal data to obtain a category corresponding to the abnormal data;
acquiring corresponding audit content from a preset database based on the category;
extracting the characteristics of the abnormal data through a keyword extraction model to obtain a characteristic text of the abnormal data, wherein the keyword extraction model is obtained based on a TextRank model;
matching the audit content with the feature text by using a text matching model to obtain a corresponding matching degree, wherein the text matching model is obtained based on bimpm model training;
and comparing each matching degree with a preset threshold value to determine an auditing result.
Further, the classifying the abnormal data to obtain a category corresponding to the abnormal data includes:
acquiring text data corresponding to all the categories;
scanning the abnormal data, and judging whether the abnormal data contains the text data;
if the abnormal data contains the text data, taking the category corresponding to the text data as the category of the abnormal data;
if the abnormal data does not contain the text data, classifying the abnormal data by using a text classification model to obtain a class corresponding to the abnormal data, wherein the text classification model is obtained by training based on an LDA model.
Further, the classifying the abnormal data by using the text classification model to obtain the category corresponding to the abnormal data includes:
pre-classifying the abnormal data by using the text classification model to obtain a classification result of the abnormal data;
comparing the classification result with the text data, and judging whether the text data contains the classification result;
if the text data contains the classification result, taking the classification result as a category corresponding to the abnormal data;
and if the text data does not contain the classification result, sending first question information to a user.
Further, the extracting the features of the abnormal data through the keyword extraction model to obtain the feature text of the abnormal data includes:
segmenting the abnormal data to obtain words corresponding to each part in the abnormal data;
extracting features of each part of words in the abnormal data independently by using the keyword extraction model to obtain features and corresponding weights of each part;
sorting the features from big to small based on the weight corresponding to the features, and extracting a preset number of features before sorting as keywords corresponding to each part in the abnormal data;
and collecting the keywords corresponding to the parts to obtain the feature text.
Further, the segmenting the abnormal data to obtain words corresponding to each part in the abnormal data includes:
carrying out word segmentation processing on the abnormal data by utilizing the ending word segmentation to obtain a plurality of corresponding words;
and performing part-of-speech tagging on the words, and removing the words with the part-of-speech being stop words to obtain words corresponding to each part in the abnormal data.
Further, the matching the audit content and the feature text by using a text matching model to obtain a corresponding matching degree includes:
acquiring corresponding keywords in the feature text based on each audit element in the audit content;
and matching the content corresponding to each audit element with the keywords corresponding to the audit element through the text matching model to obtain the matching degree corresponding to each audit element.
Further, the comparing each matching degree with a preset threshold to determine an audit result includes:
extracting the matching degree corresponding to the first auditing element in the auditing content;
judging whether the matching degree corresponding to the first checking element is larger than or equal to a first preset value or not;
when the matching degree corresponding to the first auditing element is smaller than a first preset value, directly sending second problem information to the user;
when the matching degree corresponding to the first auditing element is greater than or equal to a first preset value, judging the matching degree corresponding to a second auditing element in the auditing content and the size of a second preset value;
when the matching degree corresponding to the second review element in the review content is smaller than a second preset value, directly sending third problem information to the user;
when the matching degree corresponding to the second auditing element in the auditing content is greater than or equal to a second preset value, outputting the content corresponding to the first auditing element with the matching degree greater than or equal to the first preset value and the second auditing element with the matching degree greater than or equal to the second preset value;
and filling the content corresponding to the first audit element with the matching degree being more than or equal to a first preset value and the content corresponding to the second audit element with the matching degree being more than or equal to a second preset value into a preset list, and obtaining the audit result based on the filled preset list.
In order to solve the above problem, the present application further provides an abnormal data auditing apparatus, including:
the first acquisition module is used for acquiring abnormal data;
the classification module is used for classifying the abnormal data to obtain a category corresponding to the abnormal data;
the second acquisition module is used for acquiring corresponding audit content from a preset database based on the category;
the characteristic extraction module is used for extracting the characteristics of the abnormal data through a keyword extraction model to obtain a characteristic text of the abnormal data, and the keyword extraction model is obtained based on a TextRank model;
the matching module is used for matching the audit content with the feature text by utilizing a text matching model to obtain a corresponding matching degree, and the text matching model is obtained based on bimpm model training;
and the output module is used for comparing each matching degree with a preset threshold value so as to determine an auditing result.
In order to solve the above problem, the present application also provides a computer device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the exception data auditing method as described above.
In order to solve the above problem, the present application also provides a non-volatile computer readable storage medium, which stores computer readable instructions, and when the computer readable instructions are executed by a processor, the abnormal data auditing method is implemented as described above.
Compared with the prior art, the abnormal data auditing method, the abnormal data auditing device, the abnormal data auditing equipment and the storage medium provided by the embodiment of the application have the following beneficial effects that:
obtaining abnormal data to be audited, classifying the abnormal data to obtain a category corresponding to the abnormal data, and obtaining corresponding audit content based on the category to realize targeted audit on the abnormal data, wherein the audit content refers to a plurality of accurate responses under an audit rule; performing feature extraction on the abnormal data by using a pre-trained keyword extraction model to obtain a feature text of the abnormal data, and representing the abnormal data by using the feature text to facilitate text matching later; and matching the plurality of accurate responses and the feature text by using a pre-trained text matching model, obtaining a corresponding matching degree for one matching part, and comparing all the matching degrees with a preset threshold value to obtain an auditing result, so that the auditing efficiency and accuracy are improved.
Drawings
In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for describing the embodiments of the present application, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive effort.
Fig. 1 is a schematic flowchart of an abnormal data auditing method according to an embodiment of the present application;
fig. 2 is a schematic block diagram of an abnormal data auditing apparatus according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. One skilled in the art will explicitly or implicitly appreciate that the embodiments described herein can be combined with other embodiments.
The application provides an abnormal data auditing method. Referring to fig. 1, a schematic flow chart of an abnormal data auditing method according to an embodiment of the present application is shown.
In this embodiment, the abnormal data auditing method includes:
s1, acquiring abnormal data;
in the present application, the abnormal data input by the user can be directly received, or the abnormal data can be extracted from the database. The abnormal data comprises a text corresponding to a claim case, and the text corresponding to the claim case comprises a name of a claim person, a case condition description and the like. The abnormal data refers to data corresponding to a claim case, and the claim case does not have a specific audit result.
Further, the acquiring abnormal data to be audited includes:
sending a calling request to a preset knowledge base, wherein the calling request carries a signature checking token;
and receiving a signature verification result returned by the knowledge base, and calling abnormal data in the preset knowledge base when the signature verification result is passed, wherein the signature verification result is obtained by verifying the knowledge base in an RSA (rivest-Shamir-Adleman) asymmetric encryption mode according to the signature verification token.
Specifically, because the abnormal data can relate to user's private data, so all can keep to predetermineeing the database to the abnormal data, so when acquireing abnormal data, the database can carry out the step of checking the label to guarantee the safety of data, avoid revealing data scheduling problem.
The whole process is that the client calculates a first message digest of the message m, encrypts the first message digest by using an RSA asymmetric encryption mode (by using a private key of the client) to obtain a signature s, reuses the message m and the signature s by using a public key of a knowledge base to obtain a ciphertext c, sends the ciphertext c to the knowledge base, decrypts the ciphertext c by using the private key of the knowledge base to obtain the message m and the signature s, and decrypts the signature s by using the public key of the client by using the knowledge base to obtain the first message digest; meanwhile, the knowledge base extracts the message m by the same method to obtain a second message abstract, judges whether the first message abstract and the second message abstract are the same, and if the first message abstract and the second message abstract are the same, the verification is successful; otherwise, the authentication fails.
Through when the data is called, the signature needs to be checked, the safety of the data stored in the database is guaranteed, and data leakage is avoided.
S2, classifying the abnormal data to obtain the corresponding category of the abnormal data;
specifically, the abnormal data is classified to obtain the corresponding category, so that the audit content is conveniently matched with the abnormal data in the follow-up process, and the abnormal data can be processed by means of scanning or text classification models and the like.
Further, the classifying the abnormal data to obtain a category corresponding to the abnormal data includes:
acquiring text data corresponding to all the categories;
scanning the abnormal data, and judging whether the abnormal data contains the text data;
if the abnormal data contains the text data, taking the category corresponding to the text data as the category of the abnormal data;
if the abnormal data does not contain the text data, classifying the abnormal data by using a text classification model to obtain a class corresponding to the abnormal data, wherein the text classification model is obtained by training based on an LDA model.
Firstly, acquiring text data corresponding to all categories, for example, buying insurance can buy a plurality of dangerous species, acquiring names of the dangerous species, namely acquiring all the names of the dangerous species, wherein the names of the dangerous species are categories, and the text corresponding to the names of the dangerous species is text data; scanning abnormal data according to text data corresponding to the name of the dangerous species, judging whether the abnormal data contains the text data, namely whether the abnormal data contains the name of the dangerous species, and if the abnormal data contains the text data in the scanning mode, taking the category corresponding to the text data as the category of the abnormal data. If the text data is not contained in the abnormal data, classifying the abnormal data by using the text classification model to obtain the category corresponding to the abnormal data.
Specifically, the claim case when applying for claim is the abnormal data, and the abnormal data contains the name of the person applying for claim, the risk species applying for claim, the specific case condition description, and the like. The method mainly aims at scanning for claim settlement risk species; of course, the anomaly data may be scanned in its entirety. The scanning of the claim settlement dangerous species is to judge whether the abnormal data directly contains the full name or the short name of the dangerous species name.
By determining the category of the abnormal data, the matching of the audit content is facilitated.
Still further, the classifying the abnormal data by using the text classification model to obtain the category corresponding to the abnormal data includes:
pre-classifying the abnormal data by using the text classification model to obtain a classification result of the abnormal data;
comparing the classification result with the text data, and judging whether the text data contains the classification result;
if the text data contains the classification result, taking the classification result as a category corresponding to the abnormal data; and if the text data does not contain the classification result, sending first question information to a user.
Specifically, if the text data is not directly scanned in the abnormal data, the abnormal data is pre-classified by using a pre-trained text classification model, so as to obtain a classification result of the abnormal data.
The classification result obtained needs to accord with the name of the dangerous species, so whether the text data contains the classification result needs to be judged, if the text data contains the classification result, the classification result is proved to have the same classification as the classification result in the classification corresponding to the text data, and the classification result can be directly used as the classification of the abnormal data; and if the classification result is not contained, sending first question information to the user to report errors.
An lda (latent Dirichlet allocation) model is a document topic generation model, which is also called a three-layer bayesian probability model and comprises three layers of structures including words, topics and documents. By generative model, we mean that each word of an article is considered to be obtained through a process of "selecting a topic with a certain probability and selecting a word from the topic with a certain probability". Document-to-topic follows a polynomial distribution, and topic-to-word follows a polynomial distribution.
By utilizing the text classification model, the accurate judgment of the category of the abnormal data is realized.
S3, acquiring corresponding audit content from a preset database based on the category;
specifically, based on the mapping relationship between the category and the audit content, after the category is obtained, the corresponding audit content is obtained from the database. The audit content is not an audit rule, but a plurality of accurate reply contents under the audit rule.
Before acquiring the corresponding audit content, acquiring the informant information, namely the informant name, in the abnormal data, calling corresponding purchase information based on the informant name, and judging whether the dangerous type information is consistent with the corresponding category of the abnormal data or not based on the dangerous type information in the purchase information; if not, sending error information to the user. And if the verification contents are consistent, acquiring the corresponding verification contents.
S4, performing feature extraction on the abnormal data through a keyword extraction model to obtain a feature text of the abnormal data, wherein the keyword extraction model is obtained based on a TextRank model;
specifically, feature extraction is performed on each part independently in the abnormal data by using a keyword extraction model to obtain keywords corresponding to each part, and the keywords of each part are collected to obtain a feature text.
For example, the abnormal data includes contents such as the claim application risk category and the specific case condition description, and feature extraction is performed on the claim application risk category and the specific case condition description respectively to obtain keywords corresponding to the claim application risk category and the specific case condition description.
Further, the extracting the features of the abnormal data through the keyword extraction model to obtain the feature text of the abnormal data includes:
segmenting the abnormal data to obtain words corresponding to each part in the abnormal data;
extracting features of each part of words in the abnormal data independently by using the keyword extraction model to obtain features and corresponding weights of each part;
sorting the features from big to small based on the weight corresponding to the features, and extracting a preset number of features before sorting as keywords corresponding to each part in the abnormal data;
and collecting the keywords corresponding to the parts to obtain the feature text.
Specifically, the keyword extraction model firstly segments the abnormal data to obtain corresponding words, and then independently performs feature extraction on each part of words to obtain features and corresponding weights in each part. And sorting the features from big to small based on the weight corresponding to the features, and extracting the features with the preset number as keywords corresponding to each part.
The abnormal data is divided into a plurality of parts, such as the name of a person applying for claim, the name of a dangerous case applying for claim, the description of a specific case situation and the like, feature extraction is respectively carried out on each part, and after the feature extraction, only the feature n before the weight ranking is extracted as a keyword. The n can be freely set as required. The abnormal data are claims cases, which all have a fixed format, such as name of the person applying for claims, risk of applying for claims, etc., and the applicant must fill corresponding contents, i.e. a plurality of parts, in the fixed format.
And after the collection of each part is finished, collecting the keywords corresponding to each part to form a feature text.
In the feature text, keywords belonging to the same part are gathered together and classified and stored.
The TextRank model constructs a network through adjacent relations among words, then the rank value of each node is calculated by using PageRank iteration, and the keywords can be obtained by sorting the rank values.
The abnormal data is subjected to feature extraction by using a keyword extraction model, so that the data is refined, and the matching efficiency and the auditing efficiency of the follow-up matching of the auditing content and the feature text are improved.
Further, the segmenting the abnormal data to obtain words corresponding to each part in the abnormal data includes:
carrying out word segmentation processing on the abnormal data by utilizing the ending word segmentation to obtain a plurality of corresponding words;
and performing part-of-speech tagging on the words, and removing the words with the part-of-speech being stop words to obtain words corresponding to each part in the abnormal data.
Specifically, the ending participle supports three word segmentation modes; the precise mode separates the sentences most precisely; in the full mode, all words which can be formed into words in a sentence are scanned; and the search engine mode is used for segmenting the long words again on the basis of the accurate mode. In the present application, the final participle of the precise mode is used, which facilitates the precise separation of the sentences to be processed.
The ending toolkit in python directly utilized in the application can process each input abnormal data to perform segmentation by importing the ending toolkit, and word segmentation processing of the abnormal data is achieved.
For example, if "the flow after investigation is damage assessment" is word segmentation processing using the precise pattern of the segmentation of words, the word "investigation/after/flow/damage assessment" will be obtained.
And after segmentation, carrying out stop word removal on the segmented words by using the existing stop word stock and carrying out word segmentation by the aid of the ending, specifically, sequentially inquiring the segmented words in the stop word stock, and if the inquired words are found, removing the corresponding words, so that the stop words are removed.
The word segmentation and part-of-speech tagging are realized through the ending word segmentation, and the word with the part-of-speech being the stop word is removed, so that the preprocessing of abnormal data is realized.
Still further, the performing word segmentation processing on the abnormal data by using the ending word segmentation includes:
scanning the abnormal data based on a preset Trie tree, and identifying various segmentation combinations of words in the abnormal data;
constructing a directed acyclic graph based on all identified segmentation combinations, dynamically planning and searching a maximum probability path by using the directed acyclic graph, determining a segmentation combination of a maximum probability, and segmenting the abnormal data based on the segmentation combination of the maximum probability;
and for the unrecognized words, performing segmentation by adopting a hidden Markov model.
Specifically, the Trie, also called a dictionary tree, is a common data structure and is also a prefix tree, which is used for performing fast string matching in a string list. And scanning the sentence to be processed based on a preset Trie tree, identifying various segmentation combinations of words in the sentence to be processed, and scanning and matching the sentence to be processed and the Trie tree to generate various segmentation combinations of the words. Combining the multiple segmentations to form a directed acyclic graph, wherein each node in the directed acyclic graph is a segmented word.
And then dynamically planning and searching a maximum probability path by using the directed acyclic graph, and converting the occurrence frequency of each word into frequency when generating a Trie tree by using a dictionary. For a plurality of given segmentation combinations, the occurrence frequency of the segmentation combinations, namely the probability of each node in the directed acyclic graph, is searched for each segmentation combination, the main function for calculating the maximum probability path is calc, and the function calculates the maximum probability path according to the constructed directed acyclic graph. The function calc is a dynamic programming from bottom to top, and calculates the probability logarithm scores of the segmentation combinations of the sentences to be processed in a mode of traversing each word of the sentences to be processed in a reverse order from the last word of the sentences to be processed. And then storing and outputting the case with the highest probability logarithm score in the segmentation combination mode. Namely, the segmentation combination with the maximum probability is obtained, and the sentence to be processed is segmented based on the segmentation combination.
And because the dictionary is limited and can not contain all words, the words which do not appear in the dictionary are segmented by adopting a hidden Markov model, the hidden Markov model marks Chinese words according to four states of BEMS, B is a starting position, E is an ending position, M is a middle position, S is a position of a single word, and the ending analysis adopts the four states to mark the Chinese words, for example, Beijing can BE marked as BE, namely Beijing/B Beijing/E, namely Beijing is the starting position, and Beijing is the ending position. To perform the splitting.
By the method, the sentence to be processed is segmented, and the segmentation combination closest to the real situation can be obtained.
S5, matching the audit content with the feature text by using a text matching model to obtain a corresponding matching degree, wherein the text matching model is obtained based on bimpm model training;
specifically, the examination content and the feature text are correspondingly matched to obtain the matching degree, so that the claim settlement content can be conveniently filled in the follow-up process.
Further, the matching the audit content and the feature text by using a text matching model to obtain a corresponding matching degree includes:
acquiring corresponding keywords in the feature text based on each audit element in the audit content;
and matching the content corresponding to each audit element with the keywords corresponding to the audit element through the text matching model to obtain the matching degree corresponding to each audit element, wherein the text matching model is obtained based on bimpm model training.
Specifically, the audit content includes audit elements, and corresponding keywords in the feature text are obtained based on the audit elements to perform subsequent matching.
The auditing elements can correspond to all parts in the feature text, and the keywords extracted from all parts can correspond to all auditing elements.
For example, the audit content comprises a time audit element, a state audit element, a damage condition audit element and the like, wherein the damage condition element is classified into a slight state, a normal state and a serious state; and matching the slight, common and serious keywords with the corresponding keywords to obtain corresponding matching degrees, wherein the matching degrees refer to the numerical values with the highest weight among the slight, common and serious keywords.
The keywords in the feature text are matched with the content corresponding to each audit element to obtain the matching degree of each audit element in the audit content, abnormal data are judged according to the matching degree, the abnormal data are audited, and the audit efficiency is improved.
And S6, comparing each matching degree with a preset threshold value to determine an auditing result.
Specifically, each matching degree is compared with a preset threshold, the content corresponding to the matching degree greater than the preset threshold is filled into a subsequent list, and an audit result is obtained based on the list.
Further, the comparing each matching degree with a preset threshold to determine an audit result includes:
extracting the matching degree corresponding to the first auditing element in the auditing content;
judging whether the matching degree corresponding to the first checking element is larger than or equal to a first preset value or not;
when the matching degree corresponding to the first auditing element is smaller than a first preset value, directly sending second problem information to the user;
when the matching degree corresponding to the first auditing element is greater than or equal to a first preset value, judging the matching degree corresponding to a second auditing element in the auditing content and the size of a second preset value;
when the matching degree corresponding to the second review element in the review content is smaller than a second preset value, directly sending third problem information to the user;
when the matching degree corresponding to the second auditing element in the auditing content is greater than or equal to a second preset value, outputting the content corresponding to the first auditing element with the matching degree greater than or equal to the first preset value and the second auditing element with the matching degree greater than or equal to the second preset value;
and filling the content corresponding to the first audit element with the matching degree being more than or equal to a first preset value and the content corresponding to the second audit element with the matching degree being more than or equal to a second preset value into a preset list, and obtaining the audit result based on the filled preset list.
Specifically, the audit content includes a first audit element and a second audit element, the first audit element occupies a major part compared to the whole claim audit process, and the second audit element occupies a lighter part or a minor part compared to the whole claim audit process. Judging the matching degree of the first examination element corresponding to the corresponding keyword in the feature text and a first preset value, and directly sending second problem information to the user when the matching degree of the first examination element corresponding to the first examination element is smaller than the first preset value; the second question information is question information corresponding to the first review element. For example, when a certain first review element is a time review element, and the matching degree of the time review element and the corresponding keyword is smaller than a first preset value, the time review element is sent to the user to be abnormal, and how the time review element is abnormal is specifically shown.
After the first audit element passes, judging the size of the second audit element and a second preset numerical value; when the matching degree corresponding to the second review element in the review content is smaller than a second preset value, directly sending third problem information to the user; the third question information is question information corresponding to the second review element.
When the matching degree corresponding to the second examination and check element in the examination and check content is larger than or equal to a second preset value, outputting the content of the second examination and check element corresponding to the second preset value; filling the contents of the first audit element with the matching degree being more than or equal to the first preset value and the second audit element with the matching degree being more than or equal to the second preset value into a list, and obtaining an audit result based on the list. The first preset value and the second preset value have no relation and can be set according to actual conditions.
And the content to be filled in the list corresponds to the content of each audit element.
For example, the list includes: and filling the contents of each element to be examined corresponding to the abnormal data into the corresponding position according to the time, the disease condition, the disease type and the like, and obtaining the auditing result based on the filled list. The method for obtaining the auditing result based on the list can be preset in advance, and can also be used for training based on the obtained corresponding relation between the list and the auditing result through artificial intelligence to obtain a corresponding model. And processing the list based on the model to obtain a corresponding auditing result.
The final auditing result is obtained by judging the first auditing element, the second auditing element and the preset value, and the auditing result is obtained through the multi-dimensional content, so that the accuracy of the auditing result is improved.
It is emphasized that the exception data may also be stored in a node of a blockchain in order to further ensure the privacy and security of the data.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Obtaining abnormal data to be audited, classifying the abnormal data to obtain a category corresponding to the abnormal data, and obtaining corresponding audit content based on the category to realize targeted audit on the abnormal data, wherein the audit content refers to a plurality of accurate responses under an audit rule; performing feature extraction on the abnormal data by using a pre-trained keyword extraction model to obtain a feature text of the abnormal data, and representing the abnormal data by using the feature text to facilitate text matching later; and matching the plurality of accurate responses and the feature text by using a pre-trained text matching model, obtaining a corresponding matching degree for one matching part, and comparing all the matching degrees with a preset threshold value to obtain an auditing result, so that the auditing efficiency and accuracy are improved.
The embodiment also provides an abnormal data auditing device, which is a functional module diagram of the abnormal data auditing device of the present application, as shown in fig. 2.
The abnormal data auditing device 100 can be installed in an electronic device. According to the realized functions, the abnormal data auditing device 100 can comprise a first obtaining module 101, a classifying module 102, a second obtaining module 103, a feature extracting module 104, a matching module 105 and an output module 106. A module, which may also be referred to as a unit in this application, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
a first obtaining module 101, configured to obtain abnormal data;
specifically, the first obtaining module 101 may directly receive abnormal data input by a user, or extract the abnormal data from a database.
Further, the first obtaining module 101 includes a request sending sub-module and a data calling sub-module;
the request sending submodule is used for sending a calling request to a preset knowledge base, and the calling request carries a signature checking token;
and the data calling submodule is used for receiving the signature checking result returned by the knowledge base and calling abnormal data in the knowledge base when the signature checking result passes, and the signature checking result is obtained by verifying the knowledge base in an RSA (rivest-Shamir-Adleman) asymmetric encryption mode according to the signature checking token.
When the data is called through the matching of the request sending submodule and the data calling submodule, the signature needs to be checked, the safety of the data stored in the database is guaranteed, and the data leakage is avoided.
The classification module 102 is configured to classify the abnormal data to obtain a category corresponding to the abnormal data;
specifically, the classification module 102 classifies the abnormal data to obtain a corresponding category, so as to facilitate matching of the audit content with the abnormal data in the following.
Further, the classification module 102 includes a third obtaining sub-module, a scanning sub-module, and a determining sub-module;
the third obtaining submodule is used for obtaining text data corresponding to all the categories;
the scanning submodule is used for scanning the abnormal data and judging whether the abnormal data contains the text data;
the determining submodule is used for taking the category corresponding to the text data as the category of the abnormal data if the abnormal data contains the text data; if the abnormal data does not contain the text data, classifying the abnormal data by using a text classification model to obtain a class corresponding to the abnormal data, wherein the text classification model is obtained by training based on an LDA model.
Specifically, the third obtaining sub-module obtains text data corresponding to all categories, for example, buying insurance can buy multiple dangerous species, and obtains names of the dangerous species, that is, obtains names of all the dangerous species, so that the names of the dangerous species are categories, and the text corresponding to the names of the dangerous species is text data; the scanning sub-module scans abnormal data according to text data corresponding to the dangerous seed name, judges whether the abnormal data contains the text data, namely whether the abnormal data contains the dangerous seed name, and if the determining sub-module scans that the abnormal data contains the text data, the determining sub-module takes the category corresponding to the text data as the category of the abnormal data. If the text data is not contained in the abnormal data, classifying the abnormal data by using the text classification model to obtain the category corresponding to the abnormal data.
The type of the abnormal data is determined through the cooperation of the third acquisition sub-module, the scanning sub-module and the determination sub-module, so that the follow-up corresponding examination and verification contents can be matched conveniently.
Furthermore, the determining submodule further comprises a pre-classification unit, a comparison unit and a corresponding output unit;
the pre-classification unit is used for pre-classifying the abnormal data by using the text classification model to obtain a classification result of the abnormal data;
the comparison unit is used for comparing the classification result with the text data and judging whether the text data contains the classification result;
the corresponding output unit is used for taking the classification result as the category corresponding to the abnormal data if the text data contains the classification result; and if the text data does not contain the classification result, sending first question information to a user.
Specifically, if the text data is not directly scanned in the abnormal data, the pre-classification unit pre-classifies the abnormal data by using a pre-trained text classification model to obtain a classification result of the abnormal data.
The obtained classification result needs to accord with the name of the dangerous species, so the comparison unit is needed to judge whether the text data contains the classification result, and if the corresponding output unit contains the classification result in the text data, the corresponding output unit proves that the classification result has the same classification as the classification result in the classification corresponding to the text data, so the classification result can be directly used as the classification of the abnormal data; and if the classification result is not contained, sending first question information to the user to report errors.
Through the cooperation of the pre-classification unit, the comparison unit and the corresponding output unit, the text classification model is utilized to realize accurate judgment of the category of the abnormal data.
The second obtaining module 103 is configured to obtain, based on the category, corresponding audit content from a preset database;
specifically, the second obtaining module 103 obtains the corresponding audit content from the database after obtaining the category based on the mapping relationship between the category and the audit content. The audit content is not an audit rule, but a plurality of accurate reply contents under the audit rule.
The feature extraction module 104 is configured to perform feature extraction on the abnormal data through a keyword extraction model to obtain a feature text of the abnormal data, where the keyword extraction model is obtained based on a TextRank model;
specifically, the feature extraction module 104 performs feature extraction on each part independently in the abnormal data by using a keyword extraction model to obtain keywords corresponding to each part, and collects the keywords of each part to obtain a feature text.
Further, the feature extraction module 104 includes a segmentation sub-module, a keyword extraction sub-module, a sorting sub-module, and a collection sub-module;
the segmentation submodule is used for segmenting the abnormal data to obtain words corresponding to each part in the abnormal data;
the keyword extraction submodule is used for independently extracting the features of all parts of words in the abnormal data by using the keyword extraction model to obtain the features and corresponding weights of all parts;
the sorting submodule is used for sorting the features from large to small based on the weights corresponding to the features, and extracting a preset number of features before sorting as keywords corresponding to each part in the abnormal data;
and the collecting submodule is used for collecting the keywords corresponding to the parts to obtain the feature text.
Specifically, the segmentation submodule firstly segments the abnormal data to obtain corresponding words, and the keyword extraction submodule independently extracts features of the words of each part to obtain features and corresponding weights of each part. And the sorting submodule sorts the features from large to small based on the weights corresponding to the features, and extracts a preset number of features as keywords corresponding to each part. The abnormal data is divided into a plurality of parts, such as the name of a person applying for claim, the dangerous species applying for claim, the specific case situation description and the like, the keyword extraction submodule respectively extracts the features of the parts, and only extracts the features n bits before the weight ranking as the keywords after the features are extracted. The n can be freely set as required.
And after the collection of each part is finished, the collection submodule collects the keywords corresponding to each part to form a feature text.
Through the cooperation of the segmentation sub-module, the keyword extraction sub-module, the sorting sub-module and the collection sub-module, the abnormal data is subjected to feature extraction by using a keyword extraction model, so that the data is refined, and the matching efficiency and the auditing efficiency of the follow-up matching of the auditing content and the feature text are improved.
Still further, the segmentation submodule further comprises a word segmentation unit and a removal unit;
the word segmentation unit is used for performing word segmentation processing on the abnormal data by utilizing the crust word segmentation to obtain a plurality of corresponding words;
and the removing unit is used for performing part-of-speech tagging on the plurality of words, and removing words with parts-of-speech being stop words to obtain words corresponding to each part in the abnormal data.
By matching the word segmentation sub-module and the removal sub-module, word segmentation and part-of-speech tagging are realized by word segmentation, and words with part-of-speech being stop words are removed, so that abnormal data are preprocessed.
The matching module 105 is configured to match the audit content with the feature text by using a text matching model to obtain a corresponding matching degree, where the text matching model is obtained based on bimpm model training;
specifically, the matching module 105 performs corresponding matching on the audit content and the feature text to obtain a matching degree, so as to facilitate subsequent filling of the claim settlement content.
Further, the matching module 105 includes a fourth obtaining sub-module and a text matching sub-module;
the fourth obtaining sub-module is configured to obtain, based on each audit element in the audit content, a corresponding keyword in the feature text;
the text matching sub-module is used for matching the content corresponding to each audit element with the keywords corresponding to the audit element through the text matching model to obtain the matching degree corresponding to each audit element, and the text matching model is obtained based on bimpm model training.
And matching the keywords in the feature text with the content corresponding to each audit element through the matching of the fourth acquisition sub-module and the text matching sub-module to obtain the matching degree of each audit element in the audit content, and judging the abnormal data through the matching degree to realize the audit of the abnormal data and improve the audit efficiency.
And the output module 106 is configured to compare each matching degree with a preset threshold to determine an audit result.
Specifically, the output module 106 compares each matching degree with a preset threshold, fills the content corresponding to the matching degree greater than the preset threshold into a subsequent list, and obtains an audit result based on the list.
Further, the output module 106 includes a matching degree extraction sub-module, a first judgment sub-module, a first output sub-module, a second judgment sub-module, a second output sub-module, a third output sub-module, and a processing sub-module;
the matching degree extraction sub-module is used for extracting the matching degree corresponding to the first auditing element in the auditing content;
the first judging submodule is used for judging whether the matching degree corresponding to the first checking element is larger than or equal to a first preset value or not;
the first output sub-module is used for directly sending second problem information to the user when the matching degree corresponding to the first examination element is smaller than a first preset value;
the second judging submodule is configured to judge, when the matching degree corresponding to the first audit element is greater than or equal to a first preset value, the matching degree corresponding to the second audit element in the audit content and a second preset value;
the second output sub-module is used for directly sending third problem information to the user when the matching degree corresponding to the second review element in the review content is smaller than a second preset value;
the third output sub-module is used for outputting the contents corresponding to the first review elements with the matching degrees larger than or equal to the first preset value and the second review elements with the matching degrees larger than or equal to the second preset value when the matching degrees corresponding to the second review elements in the review contents are larger than or equal to the second preset value;
and the processing sub-module is used for filling the contents corresponding to the first examination and check element with the matching degree greater than or equal to a first preset value and the second examination and check element with the matching degree greater than or equal to a second preset value into a preset list, and obtaining the examination and check result based on the filled preset list.
The first audit factor, the second audit factor and the preset value are judged through the matching degree extraction submodule, the first judgment submodule, the first output submodule, the second judgment submodule, the second output submodule and the third output submodule and the matching degree processing submodule to obtain a final audit result, the audit result is obtained through the multi-dimensional content, and the accuracy of the audit result is improved.
By adopting the device, the abnormal data auditing device 100 acquires the abnormal data to be audited through the cooperation of the first acquisition module 101, the classification module 102, the second acquisition module 103, the feature extraction module 104, the matching module 105 and the output module 106, classifies the abnormal data to obtain the belonged category corresponding to the abnormal data, and acquires the corresponding auditing content based on the category to realize the targeted auditing of the abnormal data, wherein the auditing content refers to a plurality of accurate responses under the auditing rule; performing feature extraction on the abnormal data by using a pre-trained keyword extraction model to obtain a feature text of the abnormal data, and representing the abnormal data by using the feature text to facilitate text matching later; and matching the plurality of accurate responses and the feature text by using a pre-trained text matching model, obtaining a corresponding matching degree for one matching part, and comparing all the matching degrees with a preset threshold value to obtain an auditing result, so that the auditing efficiency and accuracy are improved.
The embodiment of the application also provides computer equipment. Referring to fig. 3, fig. 3 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various application software, such as computer readable instructions of an abnormal data auditing method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the abnormal data auditing method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
In this embodiment, the steps of the abnormal data auditing method according to the above embodiments are implemented when the processor executes the computer readable instructions stored in the memory, the abnormal data is classified by obtaining the abnormal data to be audited, the category to which the abnormal data corresponds is obtained, and corresponding auditing contents are obtained based on the category to implement targeted auditing of the abnormal data, where the auditing contents refer to a plurality of accurate responses under the auditing rules; performing feature extraction on the abnormal data by using a pre-trained keyword extraction model to obtain a feature text of the abnormal data, and representing the abnormal data by using the feature text to facilitate text matching later; and matching the plurality of accurate responses and the feature text by using a pre-trained text matching model, obtaining a corresponding matching degree for one matching part, and comparing all the matching degrees with a preset threshold value to obtain an auditing result, so that the auditing efficiency and accuracy are improved.
The embodiment of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored, and the computer-readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the above abnormal data auditing method, and classifies the abnormal data by acquiring abnormal data to be audited, to obtain a category to which the abnormal data corresponds, and based on the category, acquire corresponding auditing content to implement targeted auditing on the abnormal data, where the auditing content refers to multiple accurate responses under auditing rules; performing feature extraction on the abnormal data by using a pre-trained keyword extraction model to obtain a feature text of the abnormal data, and representing the abnormal data by using the feature text to facilitate text matching later; and matching the plurality of accurate responses and the feature text by using a pre-trained text matching model, obtaining a corresponding matching degree for one matching part, and comparing all the matching degrees with a preset threshold value to obtain an auditing result, so that the auditing efficiency and accuracy are improved.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The abnormal data auditing device, the computer device and the computer readable storage medium of the embodiments of the present application have the same technical effects as the abnormal data auditing method of the embodiments, and are not expanded herein.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An abnormal data auditing method, characterized in that the method comprises:
acquiring abnormal data;
classifying the abnormal data to obtain a category corresponding to the abnormal data;
acquiring corresponding audit content from a preset database based on the category;
extracting the characteristics of the abnormal data through a keyword extraction model to obtain a characteristic text of the abnormal data, wherein the keyword extraction model is obtained based on a TextRank model;
matching the audit content with the feature text by using a text matching model to obtain a corresponding matching degree, wherein the text matching model is obtained based on bimpm model training;
and comparing each matching degree with a preset threshold value to determine an auditing result.
2. The abnormal data auditing method according to claim 1, wherein the classifying the abnormal data to obtain the category corresponding to the abnormal data comprises:
acquiring text data corresponding to all the categories;
scanning the abnormal data, and judging whether the abnormal data contains the text data;
if the abnormal data contains the text data, taking the category corresponding to the text data as the category of the abnormal data;
if the abnormal data does not contain the text data, classifying the abnormal data by using a text classification model to obtain a class corresponding to the abnormal data, wherein the text classification model is obtained by training based on an LDA model.
3. The abnormal data auditing method according to claim 2, characterized in that the classifying the abnormal data by using a text classification model to obtain the category corresponding to the abnormal data comprises:
pre-classifying the abnormal data by using the text classification model to obtain a classification result of the abnormal data;
comparing the classification result with the text data, and judging whether the text data contains the classification result;
if the text data contains the classification result, taking the classification result as a category corresponding to the abnormal data;
and if the text data does not contain the classification result, sending first question information to a user.
4. The abnormal data auditing method according to claim 1, characterized in that said extracting the characteristics of the abnormal data by a keyword extraction model to obtain the characteristic text of the abnormal data comprises:
segmenting the abnormal data to obtain words corresponding to each part in the abnormal data;
extracting features of each part of words in the abnormal data independently by using the keyword extraction model to obtain features and corresponding weights of each part;
sorting the features from big to small based on the weight corresponding to the features, and extracting a preset number of features before sorting as keywords corresponding to each part in the abnormal data;
and collecting the keywords corresponding to the parts to obtain the feature text.
5. The abnormal data auditing method of claim 4, where the segmenting the abnormal data to obtain words corresponding to each portion of the abnormal data comprises:
carrying out word segmentation processing on the abnormal data by utilizing the ending word segmentation to obtain a plurality of corresponding words;
and performing part-of-speech tagging on the words, and removing the words with the part-of-speech being stop words to obtain words corresponding to each part in the abnormal data.
6. The abnormal data auditing method of claim 1, wherein the matching of the audit content and the feature text using a text matching model to obtain a corresponding degree of matching comprises:
acquiring corresponding keywords in the feature text based on each audit element in the audit content;
and matching the content corresponding to each audit element with the keywords corresponding to the audit element through the text matching model to obtain the matching degree corresponding to each audit element.
7. The abnormal data auditing method according to any one of claims 1-6 where comparing each of the degrees of match to a preset threshold to determine an audit result includes:
extracting the matching degree corresponding to the first auditing element in the auditing content;
judging whether the matching degree corresponding to the first checking element is larger than or equal to a first preset value or not;
when the matching degree corresponding to the first auditing element is smaller than a first preset value, directly sending second problem information to the user;
when the matching degree corresponding to the first auditing element is greater than or equal to a first preset value, judging the matching degree corresponding to a second auditing element in the auditing content and the size of a second preset value;
when the matching degree corresponding to the second review element in the review content is smaller than a second preset value, directly sending third problem information to the user;
when the matching degree corresponding to the second auditing element in the auditing content is greater than or equal to a second preset value, outputting the content corresponding to the first auditing element with the matching degree greater than or equal to the first preset value and the second auditing element with the matching degree greater than or equal to the second preset value;
and filling the content corresponding to the first audit element with the matching degree being more than or equal to a first preset value and the content corresponding to the second audit element with the matching degree being more than or equal to a second preset value into a preset list, and obtaining the audit result based on the filled preset list.
8. An abnormal data auditing apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring abnormal data;
the classification module is used for classifying the abnormal data to obtain a category corresponding to the abnormal data;
the second acquisition module is used for acquiring corresponding audit content from a preset database based on the category;
the characteristic extraction module is used for extracting the characteristics of the abnormal data through a keyword extraction model to obtain a characteristic text of the abnormal data, and the keyword extraction model is obtained based on a TextRank model;
the matching module is used for matching the audit content with the feature text by utilizing a text matching model to obtain a corresponding matching degree, and the text matching model is obtained based on bimpm model training;
and the output module is used for comparing each matching degree with a preset threshold value so as to determine an auditing result.
9. A computer device, characterized in that the computer device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer readable instructions which, when executed by the processor, implement the abnormal data auditing method according to any one of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a processor implement the abnormal data auditing method according to any one of claims 1-7.
CN202111347270.6A 2021-11-15 2021-11-15 Abnormal data auditing method, device, equipment and storage medium Pending CN114064893A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111347270.6A CN114064893A (en) 2021-11-15 2021-11-15 Abnormal data auditing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111347270.6A CN114064893A (en) 2021-11-15 2021-11-15 Abnormal data auditing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114064893A true CN114064893A (en) 2022-02-18

Family

ID=80272002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111347270.6A Pending CN114064893A (en) 2021-11-15 2021-11-15 Abnormal data auditing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114064893A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065508A (en) * 2022-05-27 2022-09-16 青岛海尔科技有限公司 Method and apparatus for processing device twin data, storage medium, and electronic apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065508A (en) * 2022-05-27 2022-09-16 青岛海尔科技有限公司 Method and apparatus for processing device twin data, storage medium, and electronic apparatus

Similar Documents

Publication Publication Date Title
US11475143B2 (en) Sensitive data classification
CN109829155B (en) Keyword determination method, automatic scoring method, device, equipment and medium
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
CN111460250A (en) Image data cleaning method, image data cleaning device, image data cleaning medium, and electronic apparatus
CN112632989A (en) Method, device and equipment for prompting risk information in contract text
CN112347254B (en) Method, device, computer equipment and storage medium for classifying news text
CN110147540B (en) Method and system for generating business security requirement document
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN108920677A (en) Questionnaire method, investigating system and electronic equipment
CN115222443A (en) Client group division method, device, equipment and storage medium
US9977825B2 (en) Document analysis system, document analysis method, and document analysis program
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
EP2908283A1 (en) Forensic system, forensic method, and forensic program
CN112579781B (en) Text classification method, device, electronic equipment and medium
CN114064893A (en) Abnormal data auditing method, device, equipment and storage medium
CN109660621A (en) A kind of content delivery method and service equipment
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN116741396A (en) Article classification method and device, electronic equipment and storage medium
CN116401343A (en) Data compliance analysis method
CN116958622A (en) Data classification method, device, equipment, medium and program product
CN113657808A (en) Personnel evaluation method, device, equipment and storage medium
CN113838579A (en) Medical data anomaly detection method, device, equipment and storage medium
CN113869398A (en) Unbalanced text classification method, device, equipment and storage medium
CN109885647B (en) User history verification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination