WO2021213123A1 - 用户欺诈行为检测方法、装置、设备及存储介质 - Google Patents
用户欺诈行为检测方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2021213123A1 WO2021213123A1 PCT/CN2021/082613 CN2021082613W WO2021213123A1 WO 2021213123 A1 WO2021213123 A1 WO 2021213123A1 CN 2021082613 W CN2021082613 W CN 2021082613W WO 2021213123 A1 WO2021213123 A1 WO 2021213123A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- vector
- user
- fraud detection
- medical
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 152
- 239000013598 vector Substances 0.000 claims abstract description 195
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000003745 diagnosis Methods 0.000 claims description 78
- 238000012549 training Methods 0.000 claims description 71
- 238000013145 classification model Methods 0.000 claims description 40
- 230000006399 behavior Effects 0.000 claims description 34
- 201000010099 disease Diseases 0.000 claims description 34
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 34
- 230000007246 mechanism Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 235000019580 granularity Nutrition 0.000 description 25
- 208000024172 Cardiovascular disease Diseases 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000002526 effect on cardiovascular system Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000025978 Athletic injury Diseases 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 206010041738 Sports injury Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 206010008118 cerebral infarction Diseases 0.000 description 1
- 208000026106 cerebrovascular disease Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method, device, equipment, and storage medium for detecting user fraud.
- intelligent early warning + multi-dimensional verification are often used to detect user insurance fraud.
- the so-called intelligent early warning + multi-dimensional verification is to first use natural language processing technology to mine the semantic features of the medical (insurance) data submitted by the insured user, and then verify the existence and Fraud characteristics that do not match the facts, and finally determine whether there is fraud.
- the inventor realizes that although this method has a wide range of application scenarios, in some specific medical scenarios, such as a specific type of disease or symptom, the recognition result of fraudulent behavior of insured users is not accurate enough.
- the main purpose of this application is to provide a user fraud detection method, device, equipment and storage medium, which aims to solve the technical problem of insufficient accuracy of the recognition result of the insured user's fraudulent behavior in the prior art.
- this application provides a user fraud detection method.
- the method includes the following steps: read medical diagnosis information and user description information from the medical data submitted by the user; Find the corresponding fraud detection model in the model database; generate a model embedding vector according to the medical diagnosis information and the user description information; input the model embedding vector to the fraud detection model to obtain the model output result; The model output result determines whether the user has fraudulent behavior.
- the second aspect of the present application provides a user fraud detection device, including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, and the processor executes the computer
- the instructions When the instructions are readable, the following steps are implemented: read medical diagnosis information and user description information from the medical data submitted by the user; search for the corresponding fraud detection model in the preset model database according to the medical diagnosis information; according to the medical diagnosis Information and the user description information generate a model embedding vector; input the model embedding vector to the fraud detection model to obtain a model output result; determine whether the user has fraudulent behavior according to the model output result.
- the third aspect of the present application provides a computer-readable storage medium that stores computer instructions.
- the computer executes the following steps: Read the medical diagnosis information and user description information from the data; search for the corresponding fraud detection model in the preset model database according to the medical diagnosis information; generate the model embedding vector based on the medical diagnosis information and the user description information; The model embedding vector is input to the fraud detection model to obtain an output result of the model; according to the output result of the model, it is determined whether the user has fraudulent behavior.
- the fourth aspect of the application provides a user fraud detection device, including: an information extraction module, used to read medical diagnosis information and user description information from the medical data submitted by the user; a model acquisition module, used according to the medical treatment The diagnosis information searches for the corresponding fraud detection model in the preset model database; the vector generation module is used to generate the model embedding vector according to the medical diagnosis information and the user description information; the result acquisition module is used to embed the model The vector is input to the fraud detection model to obtain an output result of the model; the behavior judgment module is used to judge whether the user has a fraudulent behavior according to the output result of the model.
- This application reads the medical diagnosis information and user description information from the medical data submitted by the user; searches for the corresponding fraud detection model in the preset model database based on the medical diagnosis information; generates the model embedding vector based on the medical diagnosis information and user description information ; Input the model embedding vector into the fraud detection model to obtain the model output result; judge whether the user has fraudulent behavior according to the model output result. Since the fraud detection model is selected based on the medical diagnosis information in the medical data, the accuracy and pertinence of the model selection can be guaranteed.
- model embedding vector is generated based on the medical diagnosis information and user description information, compared to the general medical data
- the model embedding vector of this application is more accurate, and the accuracy and reliability of the detection result can also be guaranteed.
- FIG. 1 is a schematic structural diagram of a user fraud detection device in a hardware operating environment involved in a solution of an embodiment of the present application
- FIG. 2 is a schematic flowchart of a first embodiment of a method for detecting fraudulent behavior of a user under this application;
- FIG. 3 is a schematic flowchart of a second embodiment of a method for detecting fraudulent behavior of a user under this application;
- Fig. 4 is a structural block diagram of a first embodiment of a user fraud detection device according to this application.
- FIG. 1 is a schematic structural diagram of a user fraud detection device in a hardware operating environment involved in a solution of an embodiment of the application.
- the user fraud detection device may include a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
- the communication bus 1002 is used to implement connection and communication between these components.
- the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WI-FI) interface).
- WI-FI wireless fidelity
- the memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or a stable non-volatile memory (Non-Volatile Memory, NVM), such as a disk memory.
- RAM Random Access Memory
- NVM Non-Volatile Memory
- the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
- FIG. 1 does not constitute a limitation on the user fraud detection device, and may include more or less components than shown, or a combination of certain components, or a different component arrangement .
- the memory 1005 as a storage medium may include an operating system, a data storage module, a network communication module, a user interface module, and a user fraud detection program.
- the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with users; the processor 1001 in the user fraud detection device of this application
- the memory 1005 may be set in a user fraud detection device.
- the user fraud detection device calls a user fraud detection program stored in the memory 1005 through the processor 1001, and executes the user fraud detection method provided in the embodiment of the present application.
- the embodiment of the present application provides a method for detecting user fraud.
- FIG. 2 is a schematic flowchart of the first embodiment of the method for detecting user fraud by this application.
- the user fraud detection method includes the following steps:
- Step S10 Read medical diagnosis information and user description information from the medical data submitted by the user;
- the execution subject of the method in this embodiment can be a computing service device with data processing, network communication, and program operation functions, such as mobile phones, tablets, personal computers, etc., or user fraud detection with similar functions.
- Equipment hereinafter referred to as testing equipment.
- the medical data may be data contained in a medical insurance claim form submitted by an insured user, such as the insured’s disease symptoms, illness time, number of visits, demographic information, and claims Reason, credit rating and other information.
- the medical diagnosis information may be the disease diagnosis information of the insured person, such as disease symptoms, illness time, number of visits, etc.
- the user description information may be the description information of the insured person on the disease diagnosis result, such as description content.
- the form format or template of the medical insurance claim form is generally relatively fixed, so the testing device can read the above-mentioned medical diagnosis information and user description information in the corresponding area according to the form format or template of the medical insurance claim form.
- Step S20 Search for a corresponding fraud detection model in a preset model database according to the medical diagnosis information
- the above-mentioned fraud detection model may be a classification model trained in advance based on medical data of insurance fraud in the past and used to detect whether the insured user is fraudulent. Since fraud only contains two results, fraud and non-fraud, the above classification model can adopt a two-class model labeled fraud/non-fraud, that is, the final output result of the trained fraud detection model is only one: fraud or non-fraud .
- this embodiment preferably adopts a model training method based on federated learning to perform the model training of the fraud detection model described above.
- model training can be refined for different disease types in advance, and then these trained fraud detection models can be associated with the corresponding disease types and stored in In the pre-set model database, the detection equipment can search according to the disease type label contained in the medical diagnosis information.
- the above-mentioned association manner can be achieved by establishing a mapping relationship between the model identifier (such as the model name, model calling path, storage path, etc.) of the fraud detection model and the disease type.
- the disease type label refers to the type or corresponding name of the disease, such as cardiovascular disease, tumor, chronic disease, etc.
- the detection device judges whether to perform fraud detection based on the fraud detection model corresponding to a single disease (ie, single disease model), or whether to perform fraud detection based on the fraud detection model corresponding to the complication of the disease (ie, complication model) , It can be determined according to the number of labels of the disease type. For example, if the testing equipment determines that the disease type labels contained in the medical diagnosis information include cardiovascular disease and cerebral infarction, the number of labels is greater than 2, and then if only cardiovascular disease is used, The fraud detection model may cause the detection results to be inaccurate. Therefore, the fraud detection model corresponding to the complications of cardiovascular disease needs to be used for detection.
- the detection device may obtain the disease type label contained in the medical diagnosis information; then obtain the label quantity of the disease type label, and determine the model category according to the label quantity; and then set the preset model according to the model category Find the corresponding fraud detection model in the database.
- the way to determine the model category according to the number of tags can be to first determine which type of model to use according to the number of tags, that is, determine the model category (single disease model or complication model), and then determine the corresponding fraudulent behavior according to the model category and disease type label Check the model.
- Step S30 Generate a model embedding vector according to the medical diagnosis information and the user description information;
- the binary classification model in this embodiment can be a logistic regression model, a support vector machine (SVM), a random forest (RF), a multilayer neural network (MLP), etc.
- the model embedding vector is the input parameter of the model.
- the detection device will generate the model embedding vector according to the medical diagnosis information and user description information, that is, vectorize the medical diagnosis information and user description information to obtain the model embedding vector.
- the detection device can embed the medical diagnosis information and the user description information as the aforementioned model embedding vector through the BERT (Bidirectional Encoder Representations from Transformers) algorithm.
- BERT Bidirectional Encoder Representations from Transformers
- BERT learns a good feature representation for words by running a self-supervised learning method on the basis of a large amount of corpus.
- the so-called self-supervised learning refers to supervised learning that runs on data that is not manually labeled.
- the model embedding vector is obtained through the BERT algorithm, which can omit the link of manually labeling data, which effectively saves manpower and material resources.
- Step S40 Input the model embedding vector to the fraud detection model to obtain model output results
- the model embedding vector can be input to the fraud detection model to obtain the model output result.
- the model output is usually the probability value of each category, such as a fraud probability value of 80%, a non-fraud probability value of 20%, and the detection device can determine that the user has fraudulent behavior. .
- Step S50 Determine whether the user has fraudulent behavior according to the output result of the model.
- the detection device obtains the category probability value output by the model, it can determine the final behavior result according to the category with the larger probability value among the category probability values, and then determine whether the user has fraudulent behavior.
- the medical diagnosis information and user description information are read from the medical data submitted by the user; the corresponding fraud detection model is searched in the preset model database according to the medical diagnosis information; the model embedding is generated based on the medical diagnosis information and user description information Vector; input the model embedding vector into the fraud detection model to obtain the model output result; judge whether the user has fraudulent behavior according to the model output result. Since the fraud detection model is selected based on the medical diagnosis information in the medical data, the accuracy and pertinence of the model selection can be guaranteed.
- model embedding vector is generated based on the medical diagnosis information and user description information, compared to the general medical data
- the model embedding vector of this application is more accurate, and the accuracy and reliability of the detection result can also be guaranteed.
- FIG. 3 is a schematic flowchart of a second embodiment of a method for detecting user fraud in this application.
- the method before the step S10, the method further includes:
- Step S01 Obtain the initial classification model to be trained from the central server
- the central server in this embodiment may be a core server participating in the training of the federated learning model, and the central server stores the training required by the model trainers (at least two) participating in the federated learning model training.
- the initial model Since this embodiment adopts a two-classification model to detect fraudulent behaviors of users, the initial model can be defined as an initial classification model.
- each model training party participating in model training (the detection equipment in this embodiment also belongs to the model training party) first downloads the initial classification model from the central server, and then performs the respective downloaded models based on the data in the local database.
- Model training and then return the model parameters corresponding to the trained model to the central server; the central server aggregates the received model parameters, and models the initial classification model stored in its database according to the aggregated model parameters The parameters are updated, and then the updated model is verified whether the model has converged.
- the updated model can be used as a fraud detection model that can be put into use; if the model has not converged, the above model will be executed in a loop Training-model parameter return-(central server) the step of updating model parameters until the model converges.
- Step S02 when a sample alignment instruction is received, extract the model training set from the local database according to the sample identification included in the sample alignment instruction;
- the sample alignment instruction is a command that instructs each model trainer participating in model training to perform a sample alignment operation.
- the command contains a sample identifier for each model trainer to intersect local data.
- Get the model training set For example, the model training data owned by the model training party a includes ⁇ A, B, C ⁇ , and the model training data owned by the model training party b includes ⁇ C, D, E ⁇ . Data) corresponding identification information (such as data storage path, data name, etc.), so that each model trainer extracts the same model training set from the local database according to the same sample identification, thereby ensuring the consistency of model training.
- Step S03 Obtain the medical data sample included in the model training set and the fraud result corresponding to the medical data sample;
- the model training set of this embodiment needs to contain a medical data sample and the fraud result corresponding to the medical data sample. For example, if the medical data (sample) submitted by user A has fraudulent behavior, the corresponding fraud result is fraud; the medical treatment submitted by user B If there is no fraud in the data (sample), the corresponding fraud result is non-fraud. Therefore, after the detection device obtains the model training set, it needs to extract the medical data sample and the fraud result corresponding to the medical data sample, and then start training the model.
- the fraud result in this embodiment may be attached or marked on the medical data sample in the form of a label.
- Step S04 Perform model training based on federated learning on the initial classification model according to the medical data sample and the fraud result to obtain a fraud detection model.
- federated machine learning is also known as federated learning, joint learning, and federated learning.
- Federal machine learning is a machine learning framework that can effectively help multiple institutions to perform data usage and machine learning modeling under the requirements of user privacy protection, data security, and government regulations.
- this embodiment preferably adopts a horizontal federated learning method to train the model.
- the so-called horizontal federated learning when the user features of the two data sets overlap more and the user overlaps less, the data sets of the two parties participating in the federated modeling or model training (of course can also be multi-party) are divided according to the horizontal ( That is, the user dimension) is divided, and the part of the data that has the same characteristics of the two users but not the same is taken out for training. For example, two insurance companies in different regions, their user groups are from their respective regions, the intersection of each other is small, but their businesses are very similar, so the recorded user characteristics are the same, then you can use horizontal federation Learn to build a joint model or joint model training.
- model training is performed by a model training method based on federated learning, which can overcome the problem that insurance institutions or users of different insurance types have relatively limited medical insurance data, and the model training samples have fewer positive samples.
- step S04 can be specifically detailed as follows:
- Step S041 Read medical diagnosis samples and user description samples from the medical data samples
- the medical diagnosis sample refers to sample data containing medical diagnosis information
- the user description sample refers to sample data containing user description information
- Step S042 Embed the medical diagnosis sample and the user description sample into medical feature vectors of different dimensions
- a pre-trained BERT model can be used to embed data into medical feature vectors of different dimensions.
- the detection device can embed medical diagnosis samples and user description samples into vectors of two granularities through a pre-trained BERT model: a vector based on word granularity and a vector based on text granularity, that is, the aforementioned medical feature vector.
- Step S043 Determine the initial model embedding vector according to the medical feature vectors of different dimensions
- the detection device after the detection device obtains the above-mentioned medical feature vectors of different dimensions, it can determine the initial model embedding vector that is input to the initial classification model for model training according to the medical feature vector. Specifically, the detection device may splice the medical feature vectors of different dimensions, and then use the spliced medical feature vectors as the initial model embedding vector.
- this embodiment will adopt an attention mechanism to obtain an initial model embedding vector with stronger expressive ability.
- the detection device can extract several word granularity vectors and several text granularity vectors from medical feature vectors of different dimensions; define the attention score of each word granularity vector based on the attention mechanism, and determine each according to the attention score.
- the vector weight of the text granularity vector; the initial model embedding vector is determined according to the text granularity vector and the vector weight.
- the weight of the text granularity vector can be obtained in the following way:
- the detection device first obtains the initialization vector corresponding to the medical data; then based on the attention mechanism, the attention score of each word granularity vector is defined by the initialization vector and the following formula; and then the text granularity is determined according to the attention score The vector weight of the vector.
- score i, j is the attention score of the word granularity vector
- Is the text embedding of the word granularity vector
- s j is the initialized medical feature vector
- the vector weight of each text granularity vector is determined by the following formula
- I the vector weight of the text granularity vector.
- Step S044 Perform model training based on federated learning on the initial classification model according to the initial model embedding vector and the fraud result to obtain a fraud detection model.
- the detection device after the detection device obtains the initial model embedding vector, it can perform model training based on federated learning on the initial classification model according to the fraud result to obtain a fraud detection model.
- the initial model embedding vector input into the initial classification in this embodiment may also include other feature vectors corresponding to other medical features.
- These other feature vectors can be, for example, text description methods, insurance The vector corresponding to features such as the credit level of the person. Therefore, in the actual process, the medical feature vectors of different dimensions can be spliced with the above-mentioned other feature vectors to obtain the initial model embedding vector.
- the detection device may also perform model training on the acquired initial classification model according to the initial model embedding vector and the fraud result , Obtain the classification model to be updated; then obtain the parameter gradients corresponding to different model parameters in the to-be-updated model; then encrypt the parameter gradients, and send the encrypted parameter gradients to the central server; finally obtain the The fraud detection model returned by the central server, where the fraud detection model is obtained by the central server after model updating the initial classification model according to the encrypted parameter gradient.
- the vector can also be migrated to the fraud detection model of each specific category.
- cardiovascular disease insurance and sports injury insurance will involve demographic information, description methods and other characteristics. Only need to fine-tune the pre-trained vectors to continue model training for specific disease types and obtain better fraud. Check the model.
- the embedding vector of the fine-tuned cardiovascular insurance feature is obtained, and then the embedding vector and other feature vectors are spliced and combined as the input of the fraud detection model (such as MLP), and the output result is the probability of fraud.
- the regular cross-entropy can also be defined as a loss function, and the ADAM optimizer is used for training using a federated learning-based model training method, and finally a cardiovascular insurance fraud detection model is obtained.
- the present application also provides a computer-readable storage medium.
- the computer-readable storage medium may be a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.
- the computer-readable storage medium stores computer instructions, and when the computer instructions are executed on the computer, the computer executes the following steps:
- FIG. 4 is a structural block diagram of a first embodiment of a user fraud detection device according to this application.
- the user fraud detection device proposed in the embodiment of the present application includes:
- the information extraction module 401 is used to read medical diagnosis information and user description information from the medical data submitted by the user;
- the model acquisition module 402 is configured to search for a corresponding fraud detection model in a preset model database according to the medical diagnosis information;
- the vector generation module 403 is configured to generate a model embedding vector according to the medical diagnosis information and the user description information;
- the result obtaining module 404 is configured to input the model embedding vector into the fraud detection model to obtain model output results;
- the behavior judgment module 405 is configured to judge whether the user has a fraudulent behavior according to the output result of the model.
- the medical diagnosis information and user description information are read from the medical data submitted by the user; the corresponding fraud detection model is searched in the preset model database according to the medical diagnosis information; the model embedding is generated based on the medical diagnosis information and user description information Vector; input the model embedding vector into the fraud detection model to obtain the model output result; judge whether the user has fraudulent behavior according to the model output result. Since the fraud detection model is selected based on the medical diagnosis information in the medical data, the accuracy and pertinence of the model selection can be guaranteed.
- model embedding vector is generated based on the medical diagnosis information and user description information, compared to the general medical data.
- the model acquisition module 402 is further configured to acquire the disease type label included in the medical diagnosis information; acquire the number of labels of the disease type label, and determine the model category according to the number of labels; The model category searches for a corresponding fraud detection model in a preset model database.
- the user fraud detection device in this embodiment further includes a model training module, which is used to obtain the initial classification model to be trained from the central server; upon receiving the sample alignment instruction, according to the sample identification contained in the sample alignment instruction Extract the model training set from the local database; obtain the medical data samples contained in the model training set and the fraud results corresponding to the medical data samples; execute the initial classification model based on the medical data samples and the fraud results
- the model training of federated learning obtains the fraud detection model.
- model training module is also used for reading medical diagnosis samples and user description samples from the medical data samples; embedding the medical diagnosis samples and the user description samples into medical feature vectors of different dimensions;
- the initial model embedding vector is determined according to the medical feature vectors of different dimensions;
- the initial classification model is trained on the model based on federated learning according to the initial model embedding vector and the fraud result to obtain a fraud detection model.
- model training module is also used to extract several word granularity vectors and several text granularity vectors from medical feature vectors of different dimensions; define the attention score of each word granularity vector based on the attention mechanism, and according to the The attention score determines the vector weight of each text granularity vector; the initial model embedding vector is determined according to the text granularity vector and the vector weight.
- model training module is also used to obtain the initialization vector corresponding to the medical data; based on the attention mechanism, the attention score of each word granularity vector is defined through the initialization vector and the following formula;
- score i, j is the attention score of the word granularity vector
- Is the text embedding of the word granularity vector
- s j is the initialized medical feature vector
- the vector weight of each text granularity vector is determined according to the attention score.
- model training module is further configured to perform model training on the acquired initial classification model according to the initial model embedding vector and the fraud result, to obtain the classification model to be updated;
- the central server updates the initial classification model according to the encrypted parameter gradient and obtains it.
- the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as read-only memory/random access
- the memory, magnetic disk, and optical disk includes several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Accounting & Taxation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Finance (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Development Economics (AREA)
- Technology Law (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
一种用户欺诈行为检测方法、装置、设备及存储介质,涉及人工智能领域,该方法包括:从医疗数据中读取医疗诊断信息以及用户描述信息(S10);根据医疗诊断信息查找对应的欺诈行为检测模型(S20);根据医疗诊断信息和用户描述信息生成模型嵌入向量(S30);将模型嵌入向量输入模型中获得模型输出结果(S40);然后判断用户是否存在欺诈行为(S50)。由于是根据医疗数据中的医疗诊断信息选取欺诈行为检测模型,从而能够保证模型选取的准确性和针对性,另外根据医疗诊断信息和用户描述信息生成模型嵌入向量,相较于将整个医疗数据笼统的输入到模型中进行欺诈行为检测的方式,本方法的模型嵌入向量更为准确,也能够保证检测结果的准确度和可靠性。
Description
本申请要求于2020年11月25日提交中国专利局、申请号为202011351758.1、发明名称为“用户欺诈行为检测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及人工智能技术领域,尤其涉及一种用户欺诈行为检测方法、装置、设备及存储介质。
医疗保险欺诈会给保险公司造成严重的经济损失,并且会同时导致消费者保费和自付费用的升高。传统的保险索赔采用的是固定规则+人工核查的方式,会非常耗费人力。
近年来,保险索赔进入智能检测阶段之后,往往采用智能预警+多维核验的方式来进行用户保险欺诈行为的检测。所谓智能预警+多维核验,即先通过自然语言处理技术对投保用户提交的医疗(保险)数据进行语义特征的挖掘,然后根据挖掘的语义特征从多个维度核验用户提交的医疗数据中是否存在与事实不符的欺诈特征,最后再确定是否存在欺诈行为。发明人意识到,这种方式虽然应用场景较为广泛,但在一些具体医疗场景,例如针对某一类具体疾病或症状时,对投保用户欺诈行为的识别结果不够准确。
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。
发明内容
本申请的主要目的在于提供了一种用户欺诈行为检测方法、装置、设备及存储介质,旨在解决现有技术对投保用户欺诈行为的识别结果不够准确的技术问题。
为实现上述目的,本申请提供了一种用户欺诈行为检测方法,所述方法包括以下步骤:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
本申请第二方面提供了一种用户欺诈行为检测设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
本申请的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:从用户提 交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
本申请第四方面提供了一种用户欺诈行为检测装置,包括:信息提取模块,用于从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;模型获取模块,用于根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;向量生成模块,用于根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;结果获取模块,用于将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;行为判断模块,用于根据所述模型输出结果判断所述用户是否存在欺诈行为。
本申请通过从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据医疗诊断信息和用户描述信息生成模型嵌入向量;将模型嵌入向量输入至欺诈行为检测模型获得模型输出结果;根据模型输出结果判断用户是否存在欺诈行为。由于是根据医疗数据中的医疗诊断信息选取欺诈行为检测模型,从而能够保证模型选取的准确性和针对性,另外根据医疗诊断信息和用户描述信息生成模型嵌入向量,相较于将整个医疗数据笼统的输入到模型中进行欺诈行为检测的方式,本申请的模型嵌入向量更为准确,也能够保证检测结果的准确度和可靠性。
图1是本申请实施例方案涉及的硬件运行环境的用户欺诈行为检测设备的结构示意图;
图2为本申请用户欺诈行为检测方法第一实施例的流程示意图;
图3为本申请用户欺诈行为检测方法第二实施例的流程示意图;
图4为本申请用户欺诈行为检测装置第一实施例的结构框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
参照图1,图1为本申请实施例方案涉及的硬件运行环境的用户欺诈行为检测设备结构示意图。
如图1所示,该用户欺诈行为检测设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真 (WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的非易失性存储器(Non-Volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的结构并不构成对用户欺诈行为检测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种存储介质的存储器1005中可以包括操作系统、数据存储模块、网络通信模块、用户接口模块以及用户欺诈行为检测程序。
在图1所示的用户欺诈行为检测设备中,网络接口1004主要用于与网络服务器进行数据通信;用户接口1003主要用于与用户进行数据交互;本申请用户欺诈行为检测设备中的处理器1001、存储器1005可以设置在用户欺诈行为检测设备中,所述用户欺诈行为检测设备通过处理器1001调用存储器1005中存储的用户欺诈行为检测程序,并执行本申请实施例提供的用户欺诈行为检测方法。
本申请实施例提供了一种用户欺诈行为检测方法,参照图2,图2为本申请用户欺诈行为检测方法第一实施例的流程示意图。
本实施例中,所述用户欺诈行为检测方法包括以下步骤:
步骤S10:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;
需要说明的是,本实施例方法的执行主体可以是具有数据处理、网络通信以及程序运行功能的计算服务设备,例如手机、平板电脑、个人电脑等,也可以是具有类似功能的用户欺诈行为检测设备,以下简称检测设备。
本实施例及下述各实施例中,所述医疗数据,可以是投保用户提交的医疗保险索赔单中包含的数据,例如投保人的疾病症状、患病时间、就诊次数、人口学信息、索赔缘由、信用级别等信息。其中,所述医疗诊断信息可以是投保人的疾病诊断信息,如疾病症状,患病时间,就诊次数等;所述用户描述信息可以是投保人对疾病诊断结果的描述信息,例如描述内容等。
应理解的是,医疗保险索赔单的表单格式或模板一般较为固定,因此检测设备可根据医疗保险索赔单的表单格式或模板来在对应区域读取上述医疗诊断信息以及用户描述信息。
步骤S20:根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;
需要说明的是,上述欺诈行为检测模型可以是预先根据过往存在保险欺诈行为的医疗数据所训练的、用于检测投保用户是否欺诈行为的分类模型。由于欺诈行为只包含欺诈和非欺诈两种结果,因此上述分类模型可采用标签为欺诈/非欺诈的二分类模型,即训练出的欺诈行为检测模型最终输出的结果只有一种:欺诈或者非欺诈。
进一步地,考虑到不同保险机构甚至不同险种都在追求用户信息的保密隐私,且各保 险机构或者不同险种的用户医疗保险数据都比较有限,而这些医疗保险数据中用于对欺诈行为检测模型进行训练的模型训练样本又存在正样本较少的问题,进而使得模型的训练变得较为困难。为了克服上述缺陷,在保证各保险机构和不同险种用户信息的私密性和安全性的前提下,本实施例优选采用基于联邦学习的模型训练方法进行上述欺诈行为检测模型的模型训练。
进一步地,为了保证所训练的欺诈行为检测模型的精准度,可预先针对不同的疾病类型精细化的进行模型训练,然后将这些训练好的欺诈行为检测模型与对应的疾病类型进行关联后存放在预先设定的模型数据库中,以便检测设备根据医疗诊断信息中包含的疾病类型标签进行查找。上述关联方式可以是通过建立欺诈行为检测模型的模型标识(例如模型名称、模型调用路径、存放路径等)与疾病类型之间的映射关系来实现。所述疾病类型标签即疾病所属的种类或对应的名称,例如心血管疾病、肿瘤、慢性病等。
需要说明的是,考虑到疾病之间可能存在多疾病并发的情况,即存在并发症的情况。针对这种情况,若采用某单一疾病对应的欺诈行为检测模型来对用户提交的医疗数据进行欺诈行为检测,将会导致检测结果出现偏差。因此本实施例在实际应用过程中,还可以根据不同疾病的并发症来训练特定的欺诈行为检测模型,以便根据实际疾病情况来选用相应的欺诈行为检测模型。
本实施例中检测设备判断是根据单一疾病对应的欺诈行为检测模型(即单一疾病模型)进行欺诈行为检测,还是根据疾病的并发症对应的欺诈行为检测模型(即并发症模型)进行欺诈行为检测,具体可根据疾病类型标签的标签数量来确定,例如,若检测设备判断出医疗诊断信息中包含的疾病类型标签有心血管疾病和脑梗塞等,标签数量大于2,此时若仅用心血管疾病对应的欺诈行为检测模型,可能会导致检测结果不够准确,因而需要采用心血管疾病的并发症对应的欺诈行为检测模型进行检测。
具体的,检测设备可获取所述医疗诊断信息中包含的疾病类型标签;然后获取所述疾病类型标签的标签数量,并根据所述标签数量确定模型类别;再根据所述模型类别在预设模型数据库中查找对应的欺诈行为检测模型。
其中,根据标签数量确定模型类别的方式可以是先根据标签数量确定使用哪一类模型,即确定模型类别(单一疾病模型或并发症模型),然后根据模型类别和疾病类型标签确定对应的欺诈行为检测模型。
步骤S30:根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;
需要说明的是,实际情况中对于用来训练的二分类模型种类很多,具体的模型类别本实施不作具体限制。例如本实施例中的二分类模型可以选用逻辑回归模型、支持向量机(SVM)、随机森林(RF)和多层神经网络(MLP)等。所述模型嵌入向量即模型的输入参数,本实施例中检测设备将根据医疗诊断信息和用户描述信息生成模型嵌入向量,即对医疗诊断信息和用户描述信息向量化,获得模型嵌入向量。
在本实施中,检测设备可通过BERT(Bidirectional Encoder Representations from Transformers)算法将医疗诊断信息和所述用户描述信息嵌入为上述模型嵌入向量。
应理解的是,BERT是通过在海量的语料的基础上运行自监督学习方法,为单词学习一个好的特征表示,所谓自监督学习是指在没有人工标注的数据上运行的监督学习。本实施例通过BERT算法获得模型嵌入向量,能够省略人工标注数据的环节,有效的节省了人力物力。
步骤S40:将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;
在具体实现中,检测设备在获取到模型嵌入向量后,即可将该模型嵌入向量输入至欺诈行为检测模型,以获得模型输出结果。应理解的是,对于二分类模型而言,其模型输出结果通常为每种类别的概率值,例如欺诈概率值80%,非欺诈概率值20%,检测设备即可判断出该用户存在欺诈行为。
步骤S50:根据所述模型输出结果判断所述用户是否存在欺诈行为。
应理解的是,检测设备在获取到模型输出的类别概率值后,即可根据该类别概率值中,概率值较大的类型来确定出最终的行为结果,进而判断出用户是否存在欺诈行为。
本实施例通过从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据医疗诊断信息和用户描述信息生成模型嵌入向量;将模型嵌入向量输入至欺诈行为检测模型获得模型输出结果;根据模型输出结果判断用户是否存在欺诈行为。由于是根据医疗数据中的医疗诊断信息选取欺诈行为检测模型,从而能够保证模型选取的准确性和针对性,另外根据医疗诊断信息和用户描述信息生成模型嵌入向量,相较于将整个医疗数据笼统的输入到模型中进行欺诈行为检测的方式,本申请的模型嵌入向量更为准确,也能够保证检测结果的准确度和可靠性。
参考图3,图3为本申请用户欺诈行为检测方法第二实施例的流程示意图。
基于上述第一实施例,在本实施例中,所述步骤S10之前还包括:
步骤S01:从中心服务器获取待训练的初始分类模型;
需要说明的是,本实施例中所述中心服务器可以是参与联邦学习的模型训练的核心服务器,该中心服务器中存放有参与本次联邦学习模型训练的模型训练方(至少两个)所要训练的初始模型,由于本实施例是采用二分类模型对用户的欺诈行为进行检测,因此,所述初始模型可定义为初始分类模型。
在实际应用中,参与模型训练的各模型训练方(本实施例中的检测设备也属于模型训练方),先从中心服务器下载初始分类模型,然后根据本地数据库中的数据对各自下载的模型进行模型训练,再将训练后的模型对应的模型参数回传至中心服务器;由中心服务器对接收到的模型参数进行聚合,并根据聚合后的模型参数对存放在其数据库中的初始分类 模型进行模型参数的更新,再对更新后的模型进行模型是否收敛的验证,在模型收敛时,即可将更新后的模型作为可以投入使用的欺诈行为检测模型;若模型还未收敛,则循环执行上述模型训练-模型参数回传-(中心服务器)更新模型参数的步骤,直至模型收敛。
步骤S02:在接收到样本对齐指令时,根据所述样本对齐指令中包含的样本标识从本地数据库中提取模型训练集;
应理解的是,所述样本对齐指令,即指示参与模型训练的各模型训练方执行样本对齐操作的命令,该命令中包含有样本标识,用于各模型训练方根据其对本地的数据求交集,获得模型训练集。例如模型训练方a拥有的模型训练数据包括{A、B、C},模型训练方b拥有的模型训练数据包括{C、D、E},该样本标识即为与数据C(样本中存在交集的数据)相对应的标识信息(如数据存储路径、数据名称等),以使得各模型训练方根据同一样本标识从本地数据库中提取相同的模型训练集,进而保证模型训练的一致性。
步骤S03:获取所述模型训练集中包含的医疗数据样本和所述医疗数据样本对应的欺诈结果;
应理解的是,为实现对医疗保险索赔的欺诈行为进行检测,保证模型训练的准确性。本实施例模型训练集中需要包含医疗数据样本和该医疗数据样本对应的欺诈结果,例如用户A提交的医疗数据(样本)存在欺诈行为,则其对应的欺诈结果则为欺诈;用户B提交的医疗数据(样本)不存在欺诈行为,则其对应的欺诈结果则为非欺诈。因此,检测设备在获取到模型训练集后,需要从中提取出医疗数据样本和该医疗数据样本对应的欺诈结果,进而开始对模型进行训练。
本实施例中所述欺诈结果可以是以标签的形式附加或标记在医疗数据样本上。
步骤S04:根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
应理解的是,联邦机器学习又名联邦学习,联合学习,联盟学习。联邦机器学习是一个机器学习框架,能有效帮助多个机构在满足用户隐私保护、数据安全和政府法规的要求下,进行数据使用和机器学习建模。
进一步地,针对不同保险机构之间用户重叠较少而特征重叠较多的特点,本实施例优选采用横向联邦学习的方法来进行训练模型。所谓横向联邦学习,在两个数据集的用户特征重叠较多而用户重叠较少的情况下把参与联邦建模或模型训练的双方(当然也可以是多方)的各自拥有的数据集按照横向(即用户维度)切分,并取出双方用户特征相同而用户不完全相同的那部分数据进行训练。比如,两家不同地区的保险公司,它们的用户群体分别来自各自所在的地区,相互的交集很小,但是他们的业务很相似,因此记录的用户特征是相同的,此时就可以使用横向联邦学习来构建联合模型或联合进行模型训练。
本实施例通过基于联邦学习的模型训练方式进行模型训练,能够克服保险机构或者不同险种的用户医疗保险数据比较有限,模型训练样本存在正样本较少的问题。
进一步地,为了能够尽可能的挖掘医疗数据对应的语义信息,避免识别出的语义特征过于简略,影响模型训练的准确度。本实施例中,所述步骤S04可具体细化为:
步骤S041:从所述医疗数据样本中读取医疗诊断样本和用户描述样本;
需要说明的是,所述医疗诊断样本即包含医疗诊断信息的样本数据,所述用户描述样本即包含用户描述信息的样本数据。
步骤S042:将所述医疗诊断样本和所述用户描述样本嵌入为不同维度的医疗特征向量;
本实施例中可采用预先训练的BERT模型将数据嵌入成不同维度的医疗特征向量。具体的,检测设备可通过预训练的BERT模型将医疗诊断样本和用户描述样本嵌入成两种粒度的向量:基于词粒度的向量以及基于文本粒度的向量,即上述医疗特征向量。
步骤S043:根据不同维度的医疗特征向量确定初始模型嵌入向量;
在具体实现中,检测设备在获取到上述不同维度的医疗特征向量后,即可根据该医疗特征向量确定输入到初始分类模型中进行模型训练的初始模型嵌入向量。具体的,检测设备可将不同维度的医疗特征向量进行拼接,然后将拼接后的医疗特征向量作为初始模型嵌入向量。
进一步地,为了保证模型训练的准确度,本实施例将采用注意力机制获得表达能力更强的初始模型嵌入向量。具体的,检测设备可以从不同维度的医疗特征向量中提取若干词粒度向量和若干文本粒度向量;基于注意力机制定义各词粒度向量的注意力分值,并根据所述注意力分值确定各文本粒度向量的向量权重;根据所述文本粒度向量和所述向量权重确定初始模型嵌入向量。
其中,文本粒度向量的权重可通过下述方式来获得:
检测设备先获取所述医疗数据对应的初始化向量;然后基于注意力机制,通过所述初始化向量和以下公式定义各词粒度向量的注意力分值;再根据所述注意力分值确定各文本粒度向量的向量权重。
进一步地,所述根据所述注意力分值确定各文本粒度向量的向量权重可通过下述方式来获得:
根据所述注意力分值,通过以下公式确定各文本粒度向量的向量权重;
步骤S044:根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于 联邦学习的模型训练,获得欺诈行为检测模型。
在具体实现中,检测设备在获取到初始模型嵌入向量后,即可根据欺诈结果对初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
需要说明的是,为了保证模型训练更加的精确,本实施例输入到初始分类中的初始模型嵌入向量还可包括其他医疗特征对应的其他特征向量,这些其他特征向量可以是例如文本描述方式、投保人信用级别等特征对应的向量。因此,在实际过程中,可将不同维度的医疗特征向量和上述其他特征向量进行拼接,获得初始模型嵌入向量。
进一步地,为了保证最终训练出的模型能够有较好的模型效果,本实施例中,检测设备还可以根据所述初始模型嵌入向量和所述欺诈结果对获取的所述初始分类模型进行模型训练,获得待更新分类模型;然后获取所述待更新模型中不同模型参数对应的参数梯度;再对所述参数梯度进行加密,并将加密后的参数梯度发送至所述中心服务器;最后获取所述中心服务器返回的欺诈行为检测模型,所述欺诈行为检测模型为所述中心服务器根据加密后的所述参数梯度对所述初始分类模型进行模型更新后获得。
进一步地,本实施在得上述到初始模型嵌入向量后,还可以将该向量迁移到各个具体类别的欺诈检测模型中去。比如心血管疾病保险和运动伤病保险都会涉及人口学信息,描述方式等特征,只需要在预训练的向量上进行微调就可以针对具体疾病类型的保险继续进行模型训练,得到效果更好的欺诈检测模型。举例来说,如果要判断用户的心血管疾病保险是否存在欺诈行为,可以用更精细领域的的数据(心血管保险的病例,人口学特征,描述方式,用户病史等)对之前的嵌入向量进行微调,得到微调后的心血管保险特征的嵌入向量,之后将嵌入向量和其他特征向量进行拼接组合,作为欺诈检测模型的输入(例如MLP),输出结果就是欺诈的概率。本实施例中还可以定义正则的交叉熵为损失函数,采用ADAM优化器利用基于联邦学习的模型训练方式进行训练,最终得到心血管保险欺诈检测的模型。
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,也可以为易失性计算机可读存储介质。计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:
从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;
根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;
根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;
将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;
根据所述模型输出结果判断所述用户是否存在欺诈行为。
参照图4,图4为本申请用户欺诈行为检测装置第一实施例的结构框图。
如图4所示,本申请实施例提出的用户欺诈行为检测装置包括:
信息提取模块401,用于从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;
模型获取模块402,用于根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;
向量生成模块403,用于根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;
结果获取模块404,用于将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;
行为判断模块405,用于根据所述模型输出结果判断所述用户是否存在欺诈行为。
本实施例通过从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据医疗诊断信息和用户描述信息生成模型嵌入向量;将模型嵌入向量输入至欺诈行为检测模型获得模型输出结果;根据模型输出结果判断用户是否存在欺诈行为。由于是根据医疗数据中的医疗诊断信息选取欺诈行为检测模型,从而能够保证模型选取的准确性和针对性,另外根据医疗诊断信息和用户描述信息生成模型嵌入向量,相较于将整个医疗数据笼统的输入到模型中进行欺诈行为检测的方式,本申请的模型嵌入向量更为准确,也能够保证检测结果的准确度和可靠性。
基于本申请上述用户欺诈行为检测装置第一实施例,提出本申请用户欺诈行为检测装置的第二实施例。
在本实施例中,所述模型获取模块402,还用于获取所述医疗诊断信息中包含的疾病类型标签;获取所述疾病类型标签的标签数量,并根据所述标签数量确定模型类别;根据所述模型类别在预设模型数据库中查找对应的欺诈行为检测模型。
进一步地,本实施例中用户欺诈行为检测装置还包括模型训练模块,用于从中心服务器获取待训练的初始分类模型;在接收到样本对齐指令时,根据所述样本对齐指令中包含的样本标识从本地数据库中提取模型训练集;获取所述模型训练集中包含的医疗数据样本和所述医疗数据样本对应的欺诈结果;根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
进一步地,所述模型训练模块,还用于从所述医疗数据样本中读取医疗诊断样本和用户描述样本;将所述医疗诊断样本和所述用户描述样本嵌入为不同维度的医疗特征向量;根据不同维度的医疗特征向量确定初始模型嵌入向量;根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
进一步地,所述模型训练模块,还用于从不同维度的医疗特征向量中提取若干词粒度向量和若干文本粒度向量;基于注意力机制定义各词粒度向量的注意力分值,并根据所述 注意力分值确定各文本粒度向量的向量权重;根据所述文本粒度向量和所述向量权重确定初始模型嵌入向量。
进一步地,所述模型训练模块,还用于获取所述医疗数据对应的初始化向量;基于注意力机制,通过所述初始化向量和以下公式定义各词粒度向量的注意力分值;
根据所述注意力分值确定各文本粒度向量的向量权重。
进一步地,所述模型训练模块,还用于根据所述初始模型嵌入向量和所述欺诈结果对获取的所述初始分类模型进行模型训练,获得待更新分类模型;获取所述待更新模型中不同模型参数对应的参数梯度;对所述参数梯度进行加密,并将加密后的参数梯度发送至所述中心服务器;获取所述中心服务器返回的欺诈行为检测模型,所述欺诈行为检测模型为所述中心服务器根据加密后的所述参数梯度对所述初始分类模型进行模型更新后获得。
本申请用户欺诈行为检测装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器/随机存取存储器、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (20)
- 一种用户欺诈行为检测方法,所述用户欺诈行为检测方法包括:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
- 如权利要求1所述的用户欺诈行为检测方法,其中,所述根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型的步骤,包括:获取所述医疗诊断信息中包含的疾病类型标签;获取所述疾病类型标签的标签数量,并根据所述标签数量确定模型类别;根据所述模型类别在预设模型数据库中查找对应的欺诈行为检测模型。
- 如权利要求1所述的用户欺诈行为检测方法,其中,所述从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息的步骤之前,所述方法还包括:从中心服务器获取待训练的初始分类模型;在接收到样本对齐指令时,根据所述样本对齐指令中包含的样本标识从本地数据库中提取模型训练集;获取所述模型训练集中包含的医疗数据样本和所述医疗数据样本对应的欺诈结果;根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求3所述的用户欺诈行为检测方法,其中,所述根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型的步骤,包括:从所述医疗数据样本中读取医疗诊断样本和用户描述样本;将所述医疗诊断样本和所述用户描述样本嵌入为不同维度的医疗特征向量;根据不同维度的医疗特征向量确定初始模型嵌入向量;根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求4所述的用户欺诈行为检测方法,其中,所述根据不同维度的医疗特征向量确定初始模型嵌入向量的步骤,包括:从不同维度的医疗特征向量中提取若干词粒度向量和若干文本粒度向量;基于注意力机制定义各词粒度向量的注意力分值,并根据所述注意力分值确定各文本粒度向量的向量权重;根据所述文本粒度向量和所述向量权重确定初始模型嵌入向量。
- 如权利要求5所述的用户欺诈行为检测方法,其中,所述根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型的步骤,包括:根据所述初始模型嵌入向量和所述欺诈结果对获取的所述初始分类模型进行模型训练,获得待更新分类模型;获取所述待更新模型中不同模型参数对应的参数梯度;对所述参数梯度进行加密,并将加密后的参数梯度发送至所述中心服务器;获取所述中心服务器返回的欺诈行为检测模型,所述欺诈行为检测模型为所述中心服务器根据加密后的所述参数梯度对所述初始分类模型进行模型更新后获得。
- 一种用户欺诈行为检测设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
- 如权利要求8所述的用户欺诈行为检测设备,其中,所述处理器执行所述计算机可读指令实现所述根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型的步骤时,包括以下步骤:获取所述医疗诊断信息中包含的疾病类型标签;获取所述疾病类型标签的标签数量,并根据所述标签数量确定模型类别;根据所述模型类别在预设模型数据库中查找对应的欺诈行为检测模型。
- 如权利要求8所述的用户欺诈行为检测设备,其中,所述处理器执行所述计算机可读指令实现所述从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息的步骤之前时,还包括以下步骤:从中心服务器获取待训练的初始分类模型;在接收到样本对齐指令时,根据所述样本对齐指令中包含的样本标识从本地数据库中提取模型训练集;获取所述模型训练集中包含的医疗数据样本和所述医疗数据样本对应的欺诈结果;根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求10所述的用户欺诈行为检测设备,其中,所述处理器执行所述计算机可读指令实现所述根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型的步骤时,包括以下步骤:从所述医疗数据样本中读取医疗诊断样本和用户描述样本;将所述医疗诊断样本和所述用户描述样本嵌入为不同维度的医疗特征向量;根据不同维度的医疗特征向量确定初始模型嵌入向量;根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求11所述的用户欺诈行为检测设备,其中,所述处理器执行所述计算机可读指令实现所述根据不同维度的医疗特征向量确定初始模型嵌入向量的步骤时,包括以下步骤:从不同维度的医疗特征向量中提取若干词粒度向量和若干文本粒度向量;基于注意力机制定义各词粒度向量的注意力分值,并根据所述注意力分值确定各文本粒度向量的向量权重;根据所述文本粒度向量和所述向量权重确定初始模型嵌入向量。
- 如权利要求12所述的用户欺诈行为检测设备,所述处理器执行所述计算机可读指令实现所述根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型的步骤时,包括以下步骤:根据所述初始模型嵌入向量和所述欺诈结果对获取的所述初始分类模型进行模型训练,获得待更新分类模型;获取所述待更新模型中不同模型参数对应的参数梯度;对所述参数梯度进行加密,并将加密后的参数梯度发送至所述中心服务器;获取所述中心服务器返回的欺诈行为检测模型,所述欺诈行为检测模型为所述中心服务器根据加密后的所述参数梯度对所述初始分类模型进行模型更新后获得。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储计算机指令,当所述计算机指令在计算机上运行时,使得计算机执行如下步骤:从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;根据所述模型输出结果判断所述用户是否存在欺诈行为。
- 如权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:获取所述医疗诊断信息中包含的疾病类型标签;获取所述疾病类型标签的标签数量,并根据所述标签数量确定模型类别;根据所述模型类别在预设模型数据库中查找对应的欺诈行为检测模型。
- 如权利要求15所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:从中心服务器获取待训练的初始分类模型;在接收到样本对齐指令时,根据所述样本对齐指令中包含的样本标识从本地数据库中提取模型训练集;获取所述模型训练集中包含的医疗数据样本和所述医疗数据样本对应的欺诈结果;根据所述医疗数据样本和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求17所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:从所述医疗数据样本中读取医疗诊断样本和用户描述样本;将所述医疗诊断样本和所述用户描述样本嵌入为不同维度的医疗特征向量;根据不同维度的医疗特征向量确定初始模型嵌入向量;根据所述初始模型嵌入向量和所述欺诈结果对所述初始分类模型执行基于联邦学习的模型训练,获得欺诈行为检测模型。
- 如权利要求18所述的计算机可读存储介质,当所述计算机指令在计算机上运行时,使得计算机还执行以下步骤:从不同维度的医疗特征向量中提取若干词粒度向量和若干文本粒度向量;基于注意力机制定义各词粒度向量的注意力分值,并根据所述注意力分值确定各文本粒度向量的向量权重;根据所述文本粒度向量和所述向量权重确定初始模型嵌入向量。
- 一种用户欺诈行为检测装置,所述用户欺诈行为检测装置包括:信息提取模块,用于从用户提交的医疗数据中读取医疗诊断信息以及用户描述信息;模型获取模块,用于根据所述医疗诊断信息在预设模型数据库中查找对应的欺诈行为检测模型;向量生成模块,用于根据所述医疗诊断信息和所述用户描述信息生成模型嵌入向量;结果获取模块,用于将所述模型嵌入向量输入至所述欺诈行为检测模型,以获得模型输出结果;行为判断模块,用于根据所述模型输出结果判断所述用户是否存在欺诈行为。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011351758.1 | 2020-11-25 | ||
CN202011351758.1A CN112463923B (zh) | 2020-11-25 | 2020-11-25 | 用户欺诈行为检测方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021213123A1 true WO2021213123A1 (zh) | 2021-10-28 |
Family
ID=74808784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/082613 WO2021213123A1 (zh) | 2020-11-25 | 2021-03-24 | 用户欺诈行为检测方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112463923B (zh) |
WO (1) | WO2021213123A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114549026A (zh) * | 2022-04-26 | 2022-05-27 | 浙江鹏信信息科技股份有限公司 | 基于算法组件库分析的未知诈骗的识别方法及系统 |
CN115116594A (zh) * | 2022-06-06 | 2022-09-27 | 中国科学院自动化研究所 | 医疗装置有效性的检测方法及装置 |
CN115225575A (zh) * | 2022-06-08 | 2022-10-21 | 香港理工大学深圳研究院 | 一种基于元数据辅助和联邦学习的未知网络流量分类方法 |
CN117575596A (zh) * | 2023-09-06 | 2024-02-20 | 临沂万鼎网络科技有限公司 | 基于人工智能的欺诈行为分析方法及数字金融大数据系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112463923B (zh) * | 2020-11-25 | 2023-04-28 | 平安科技(深圳)有限公司 | 用户欺诈行为检测方法、装置、设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631049A (zh) * | 2016-02-17 | 2016-06-01 | 北京奇虎科技有限公司 | 一种识别诈骗短信的方法和系统 |
CN109389494A (zh) * | 2018-10-25 | 2019-02-26 | 北京芯盾时代科技有限公司 | 借贷欺诈检测模型训练方法、借贷欺诈检测方法及装置 |
CN109410036A (zh) * | 2018-10-09 | 2019-03-01 | 北京芯盾时代科技有限公司 | 一种欺诈检测模型训练方法和装置及欺诈检测方法和装置 |
CN110009486A (zh) * | 2019-04-09 | 2019-07-12 | 连连银通电子支付有限公司 | 一种欺诈检测的方法、系统、设备及计算机可读存储介质 |
US20200005333A1 (en) * | 2018-06-29 | 2020-01-02 | Sachcontrol Gmbh | Information security system for fraud detection |
CN112463923A (zh) * | 2020-11-25 | 2021-03-09 | 平安科技(深圳)有限公司 | 用户欺诈行为检测方法、装置、设备及存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858032A (zh) * | 2019-02-14 | 2019-06-07 | 程淑玉 | 融合Attention机制的多粒度句子交互自然语言推理模型 |
CN110288488A (zh) * | 2019-06-24 | 2019-09-27 | 泰康保险集团股份有限公司 | 医疗险欺诈预测方法、装置、设备和可读存储介质 |
-
2020
- 2020-11-25 CN CN202011351758.1A patent/CN112463923B/zh active Active
-
2021
- 2021-03-24 WO PCT/CN2021/082613 patent/WO2021213123A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631049A (zh) * | 2016-02-17 | 2016-06-01 | 北京奇虎科技有限公司 | 一种识别诈骗短信的方法和系统 |
US20200005333A1 (en) * | 2018-06-29 | 2020-01-02 | Sachcontrol Gmbh | Information security system for fraud detection |
CN109410036A (zh) * | 2018-10-09 | 2019-03-01 | 北京芯盾时代科技有限公司 | 一种欺诈检测模型训练方法和装置及欺诈检测方法和装置 |
CN109389494A (zh) * | 2018-10-25 | 2019-02-26 | 北京芯盾时代科技有限公司 | 借贷欺诈检测模型训练方法、借贷欺诈检测方法及装置 |
CN110009486A (zh) * | 2019-04-09 | 2019-07-12 | 连连银通电子支付有限公司 | 一种欺诈检测的方法、系统、设备及计算机可读存储介质 |
CN112463923A (zh) * | 2020-11-25 | 2021-03-09 | 平安科技(深圳)有限公司 | 用户欺诈行为检测方法、装置、设备及存储介质 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114549026A (zh) * | 2022-04-26 | 2022-05-27 | 浙江鹏信信息科技股份有限公司 | 基于算法组件库分析的未知诈骗的识别方法及系统 |
CN115116594A (zh) * | 2022-06-06 | 2022-09-27 | 中国科学院自动化研究所 | 医疗装置有效性的检测方法及装置 |
CN115116594B (zh) * | 2022-06-06 | 2024-05-31 | 中国科学院自动化研究所 | 医疗装置有效性的检测方法及装置 |
CN115225575A (zh) * | 2022-06-08 | 2022-10-21 | 香港理工大学深圳研究院 | 一种基于元数据辅助和联邦学习的未知网络流量分类方法 |
CN115225575B (zh) * | 2022-06-08 | 2023-11-24 | 香港理工大学深圳研究院 | 一种基于元数据辅助和联邦学习的未知网络流量分类方法 |
CN117575596A (zh) * | 2023-09-06 | 2024-02-20 | 临沂万鼎网络科技有限公司 | 基于人工智能的欺诈行为分析方法及数字金融大数据系统 |
Also Published As
Publication number | Publication date |
---|---|
CN112463923B (zh) | 2023-04-28 |
CN112463923A (zh) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021213123A1 (zh) | 用户欺诈行为检测方法、装置、设备及存储介质 | |
CN110059320B (zh) | 实体关系抽取方法、装置、计算机设备和存储介质 | |
US20210224142A1 (en) | Systems and methods for removing identifiable information | |
WO2021203581A1 (zh) | 基于精标注文本的关键信息抽取方法、装置及存储介质 | |
WO2020232861A1 (zh) | 命名实体识别方法、电子装置及存储介质 | |
US11720615B2 (en) | Self-executing protocol generation from natural language text | |
CN112863683B (zh) | 基于人工智能的病历质控方法、装置、计算机设备及存储介质 | |
WO2020147238A1 (zh) | 关键词的确定方法、自动评分方法、装置、设备及介质 | |
CN109582772B (zh) | 合同信息提取方法、装置、计算机设备和存储介质 | |
WO2022105118A1 (zh) | 基于图像的健康状态识别方法、装置、设备及存储介质 | |
CN111159770B (zh) | 文本数据脱敏方法、装置、介质及电子设备 | |
CN111831826B (zh) | 跨领域的文本分类模型的训练方法、分类方法以及装置 | |
CN112287069B (zh) | 基于语音语义的信息检索方法、装置及计算机设备 | |
CN112988963B (zh) | 基于多流程节点的用户意图预测方法、装置、设备及介质 | |
CN111710383A (zh) | 病历质控方法、装置、计算机设备和存储介质 | |
CN114972823A (zh) | 数据处理方法、装置、设备及计算机介质 | |
CN112417887B (zh) | 敏感词句识别模型处理方法、及其相关设备 | |
CN111783126B (zh) | 一种隐私数据识别方法、装置、设备和可读介质 | |
CN113158656B (zh) | 讽刺内容识别方法、装置、电子设备以及存储介质 | |
CN112132238A (zh) | 一种识别隐私数据的方法、装置、设备和可读介质 | |
CN114693192A (zh) | 风控决策方法、装置、计算机设备和存储介质 | |
CN113868419A (zh) | 基于人工智能的文本分类方法、装置、设备及介质 | |
CN113887214B (zh) | 基于人工智能的意愿推测方法、及其相关设备 | |
CN117932058A (zh) | 基于文本分析的情绪识别方法、装置及设备 | |
WO2024098282A1 (zh) | 一种几何解题方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21792036 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21792036 Country of ref document: EP Kind code of ref document: A1 |