CN113269179B - Data processing method, device, equipment and storage medium - Google Patents
Data processing method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN113269179B CN113269179B CN202110704361.4A CN202110704361A CN113269179B CN 113269179 B CN113269179 B CN 113269179B CN 202110704361 A CN202110704361 A CN 202110704361A CN 113269179 B CN113269179 B CN 113269179B
- Authority
- CN
- China
- Prior art keywords
- information
- target
- type
- user
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 93
- 238000012545 processing Methods 0.000 claims abstract description 87
- 238000004458 analytical method Methods 0.000 claims abstract description 40
- 238000003066 decision tree Methods 0.000 claims abstract description 35
- 230000011218 segmentation Effects 0.000 claims description 62
- 238000013475 authorization Methods 0.000 claims description 22
- 238000013507 mapping Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 10
- 238000012216 screening Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 14
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012795 verification Methods 0.000 abstract description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 30
- 201000010099 disease Diseases 0.000 description 28
- 230000000391 smoking effect Effects 0.000 description 28
- 230000006870 function Effects 0.000 description 8
- 208000024891 symptom Diseases 0.000 description 6
- 230000003442 weekly effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000007689 inspection Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 235000019800 disodium phosphate Nutrition 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Finance (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Multimedia (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to artificial intelligence and provides a data processing method, a device, equipment and a storage medium. The method can acquire historical training data, the historical training data comprises first historical information, second historical information and processing information, information gain is calculated according to the historical training data, the first historical information is used as a root node, target information selected from the second historical information according to the information gain is used as a sub-node to generate an information analysis decision tree, user information and a user report are acquired, the user information is analyzed to obtain a target value, a target name is extracted from the user report, a target type to which the target name belongs is determined, if the target value is a preset value, the target type is a preset type, target factors are extracted from the user report, and the target name and the target factors are input into the information analysis decision tree to obtain a target suggestion. The invention can accurately determine the verification suggestion. Furthermore, the present invention also relates to blockchain techniques, where the target suggestion may be stored in the blockchain.
Description
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a data processing method, apparatus, device, and storage medium.
Background
Insurance underwriting is the process by which an insurance company reviews, approves, and selects risks for an applicant's application for insurance. In the process of insurance underwriting, the application is usually inspected by special underwriting personnel, however, the inspection efficiency of the method is low, and secondly, the method excessively depends on the service level of the underwriting personnel, so that the underwriting proposal cannot be accurately determined.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, apparatus, device, and storage medium that can accurately determine a warranty suggestion.
In one aspect, the present invention proposes a data processing method, including:
acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
calculating the information gain of the second historical information according to the historical training data;
taking the first history information as a root node, and taking target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree;
receiving a processing instruction, and acquiring user information and a user report according to the processing instruction;
Analyzing the user information to obtain a target value;
extracting a target name from the user report, and determining a target type to which the target name belongs;
if the target value is a preset value and the target type is a preset type, extracting a target factor from the user report;
and determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
According to a preferred embodiment of the present invention, the calculating the information gain of the second history information according to the history training data includes:
for any information in the second historical information, determining data containing any information in the historical training data as sample data, and screening a plurality of types of samples and positive samples and negative samples of each type of samples from the sample data;
calculating the total sample amount of each type of sample, and calculating the first number of each positive sample and the second number of each negative sample;
determining a first information entropy corresponding to each type of sample according to the total sample amount, the first quantity and the second quantity, wherein the first information entropy comprises: Wherein E is the first information entropy, n is the total sample amount, n1 is the first number, and n2 is the second number;
calculating the training total amount of the historical training data, and calculating the third number of positive samples in the historical training data and the fourth number of negative samples in the historical training data;
determining a second information entropy corresponding to the historical training data according to the training total amount, the third amount and the fourth amount;
determining the information gain of any information according to the second information entropy, the sample total amount, the training total amount and the first information entropy, including:wherein G is the information gain, E 2 For the second information entropy, j is the number of types of the plurality of types of samples, m is the training total amount, E i A first entropy of information for the i-th type of sample.
According to a preferred embodiment of the present invention, the generating an information analysis decision tree using the first history information as a root node and the target information selected from the second history information according to the information gain as a child node includes:
determining the second history information with the maximum information gain as the target information, and determining the second history information except the target information as characteristic information;
Extracting each type of data in the target information from the historical training data to serve as a sample to be tested;
calculating the information gain of each sample to be detected and the characteristic information as the characteristic gain;
determining the characteristic information with the maximum characteristic gain as attribute information;
and constructing the information analysis decision tree by taking the first history information as a root node, the target information as a sub-node of the root node, the attribute information as a branch node of the sub-node and the processing information as a terminal node.
According to a preferred embodiment of the present invention, the obtaining the user information and the user report according to the processing instruction includes:
analyzing the message of the processing instruction to obtain data information carried by the message;
acquiring information indicating a user from the data information as a user identification code, and determining a user corresponding to the user identification code as a target user;
acquiring information indicating a tag from the data information as an information tag;
generating an information authorization request according to the information tag, and sending the information authorization request to a user terminal of the target user;
when an authorization response sent by the user terminal based on the information authorization request is received, acquiring an information extraction key from the authorization response;
Acquiring information corresponding to the target user and the information tag from an information base simultaneously as the user information based on the information extraction key;
and acquiring information corresponding to the target user from a report library as a preliminary screening report, and selecting the user report with the largest report time as the user report.
According to a preferred embodiment of the present invention, said analyzing said user information to obtain a target value comprises:
performing word segmentation processing on the user information to obtain target word segmentation;
matching the target word with the type vocabulary in a type mapping table, wherein the type mapping table stores a plurality of type information and type vocabulary indicating each type information;
when any word in the target word is successfully matched with the type word, extracting information with a mapping relation with any word from the information word as an information value of any type information in the plurality of types of information;
acquiring the information weight of the user information;
and carrying out weighted operation on the information value according to the information weight value to obtain the target value.
According to a preferred embodiment of the present invention, the extracting the target name from the user report includes:
Identifying reporting information from the user report based on an OCR algorithm;
segmenting the report information according to a preset dictionary to obtain a plurality of segmentation paths and path word segmentation corresponding to each segmentation path;
obtaining word segmentation weights corresponding to the path word segmentation from the preset dictionary, and calculating the sum of the word segmentation weights in each segmentation path to obtain path weights;
determining the path word corresponding to the segmentation path with the largest path weight as the information word;
traversing the information word segmentation according to a preset library, and determining the information word segmentation matched with any feature in the preset library as the target name.
According to a preferred embodiment of the present invention, the object type includes a first type and a second type, and the determining the object type to which the object name belongs includes:
acquiring a first name corresponding to the first type and acquiring a second name corresponding to the second type;
if the first name does not contain the target name and the second name does not contain the target name, acquiring a first vector of the first name and acquiring a second vector of the second name;
Determining a representation mode of the first vector, and carrying out vectorization processing on the target name according to the representation mode to obtain a target vector;
calculating a first similarity of the target vector and the first vector, and calculating a second similarity of the target vector and the second vector;
if the first similarity is greater than or equal to the second similarity, determining the target type as the first type; or alternatively
And if the first similarity is smaller than the second similarity, determining the target type as the second type.
In another aspect, the present invention also provides a data processing apparatus, including:
the acquisition unit is used for acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
a calculation unit for calculating an information gain of the second history information according to the history training data;
the generating unit is used for taking the first history information as a root node and generating an information analysis decision tree by taking target information selected from the second history information according to the information gain as a sub-node;
The acquisition unit is also used for receiving the processing instruction and acquiring the user information and the user report according to the processing instruction;
the analysis unit is used for analyzing the user information to obtain a target value;
a determining unit, configured to extract a target name from the user report, and determine a target type to which the target name belongs;
the extraction unit is used for extracting target factors from the user report if the target value is a preset value and the target type is a preset type;
and the input unit is used for determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
In another aspect, the present invention also proposes an electronic device, including:
a memory storing computer readable instructions; and
And a processor executing computer readable instructions stored in the memory to implement the data processing method.
In another aspect, the present invention also proposes a computer readable storage medium having stored therein computer readable instructions that are executed by a processor in an electronic device to implement the data processing method.
According to the technical scheme, the information gain of the second historical information can be accurately determined through the historical training data, the information analysis decision tree can be accurately generated according to the first historical information, the information gain and the second historical information, the user information and the target name extracted from the user report can be analyzed to preliminarily determine the verification suggestion of the processing instruction, and when the target value is a preset value and the target type is a preset type, the target suggestion can be accurately generated through the target factors and the target names extracted from the user report. In addition, after the underwriting suggestion is preliminarily determined, the target factors and the target names are further analyzed, so that the generation efficiency of the target suggestion can be improved.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the data processing method of the present invention.
FIG. 2 is a functional block diagram of a preferred embodiment of the data processing apparatus of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a data processing method.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a preferred embodiment of the data processing method of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The data processing method is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored computer readable instructions, and the hardware of the electronic devices comprises, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGA), digital signal processors (Digital Signal Processor, DSPs), embedded devices and the like.
The electronic device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.
The electronic device may comprise a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, a group of electronic devices made up of multiple network electronic devices, or a Cloud based Cloud Computing (Cloud Computing) made up of a large number of hosts or network electronic devices.
The network on which the electronic device is located includes, but is not limited to: the internet, wide area networks, metropolitan area networks, local area networks, virtual private networks (Virtual Private Network, VPN), etc.
S10, historical training data is acquired, wherein the historical training data comprises first historical information, second historical information and processing information.
In at least one embodiment of the present invention, the first history information refers to a disease acquired by the user under the history training data sample, for example, the first history information may be a disease name. The second history information refers to a factor that causes the first history information to appear, for example, the second history information may be a cause of the disease. The processing information refers to the underwriting advice of the underwriting personnel to the user.
Further, the processing information may include: applying, refusing, charging, etc.
S11, calculating the information gain of the second historical information according to the historical training data.
In at least one embodiment of the present invention, the information gain refers to a probability in the historical training data that results in the generation of the first historical information.
In at least one embodiment of the present invention, the electronic device calculating the information gain of the second history information according to the history training data includes:
for any information in the second historical information, determining data containing any information in the historical training data as sample data, and screening a plurality of types of samples and positive samples and negative samples of each type of samples from the sample data;
calculating the total sample amount of each type of sample, and calculating the first number of each positive sample and the second number of each negative sample;
determining a first information entropy corresponding to each type of sample according to the total sample amount, the first quantity and the second quantity, wherein the first information entropy comprises:wherein E is the first information entropy, n is the total sample amount, n1 is the first number, and n2 is the second number;
calculating the training total amount of the historical training data, and calculating the third number of positive samples in the historical training data and the fourth number of negative samples in the historical training data;
Determining a second information entropy corresponding to the historical training data according to the training total amount, the third amount and the fourth amount;
determining the information gain of any information according to the second information entropy, the sample total amount, the training total amount and the first information entropy, including:wherein G is the information gain, E 2 For the second information entropy, j is the number of types of the plurality of types of samples, m is the training total amount, E i A first entropy of information for the i-th type of sample.
For example, the second history information includes A, B and C, for the second history information A (smoking), the second history information A includes three cases { daily smoking, weekly smoking frequency is greater than 4 days, weekly smoking frequency is less than 2 days }, the data including the second history information A obtained from the history training data is determined as sample data, the sample data includes { sample 1, sample 2, …, sample 10}, the training total amount is 10, the third number is 5, the fourth number is 5, the type sample is selected from the sample data, "daily smoking" includes { sample 1, sample 2}, the positive sample of the type is empty, the negative sample is { sample 1, sample 2}, the type sample "weekly smoking frequency is greater than 4 days" { sample 3, sample 4, sample 5, sample 6, sample 7}, positive samples of this type are { sample 3, sample 4, sample 5}, negative samples are { sample 6, sample 7}, type samples "smoking frequency per week is less than 2 days," sample 8, sample 9, sample 10}, positive samples of this type are { sample 8, sample 9 }, negative samples are { sample 10}, calculated for a total of samples of type samples "smoking per day" of 2, a first number of 0, a second number of 2, a total of samples of type samples "smoking frequency per week is greater than 4 days" of 5, a first number of 3, a second number of 2, a total of samples of type samples "smoking frequency per week is less than 2 days" of 3, a first number of 2, a second number of 1, a first information entropy of type samples "smoking per day" of 0, a first information of type samples "smoking frequency per week is 0.97096, the type sample "smoking frequency per week is less than 2 days" has a first information entropy of 0.914961 and the second information entropy is 1, and thus the information gain of the second history information a (smoking) is 0.24.
By analyzing the historical training data, the information gain of any information can be accurately generated.
S12, taking the first history information as a root node, and taking target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree.
In at least one embodiment of the present invention, the information analysis decision tree includes a root node, a child node, a branch node, and a termination node. Wherein the root node is the first history information. The sub-node and the branch node are respectively the second historical information, and the terminal node is the processing information.
The target information is second history information with the maximum information gain.
In at least one embodiment of the present invention, the generating, by the electronic device, the information analysis decision tree using the first history information as a root node and the target information selected from the second history information according to the information gain as a child node includes:
determining the second history information with the maximum information gain as the target information, and determining the second history information except the target information as characteristic information;
Extracting each type of data in the target information from the historical training data to serve as a sample to be tested;
calculating the information gain of each sample to be detected and the characteristic information as the characteristic gain;
determining the characteristic information with the maximum characteristic gain as attribute information;
and constructing the information analysis decision tree by taking the first history information as a root node, the target information as a sub-node of the root node, the attribute information as a branch node of the sub-node and the processing information as a terminal node.
By analyzing the historical training data, the information analysis decision tree can be quickly constructed.
S13, receiving a processing instruction, and acquiring user information and a user report according to the processing instruction.
In at least one embodiment of the present invention, the processing instruction includes a user identification code, an instruction number, a trigger user, and the like.
The user information refers to basic information of the applicant in the processing instruction, and the user report refers to a check report that the applicant is closest to the current.
In at least one embodiment of the present invention, the electronic device obtaining user information and user reports according to the processing instruction includes:
Analyzing the message of the processing instruction to obtain data information carried by the message;
acquiring information indicating a user from the data information as a user identification code, and determining a user corresponding to the user identification code as a target user;
acquiring information indicating a tag from the data information as an information tag;
generating an information authorization request according to the information tag, and sending the information authorization request to a user terminal of the target user;
when an authorization response sent by the user terminal based on the information authorization request is received, acquiring an information extraction key from the authorization response;
acquiring information corresponding to the target user and the information tag from an information base simultaneously as the user information based on the information extraction key;
and acquiring information corresponding to the target user from a report library as a preliminary screening report, and selecting the user report with the largest report time as the user report.
Wherein, the information tag refers to a tag indicating basic information of a user, and the basic information includes, but is not limited to: loan conditions, age, annual income, occupation, etc.
The information base stores basic information of a plurality of users.
The report library stores inspection reports of a plurality of users, inspection reports of different periods of each user, and the like.
By analyzing the message, the data information can be quickly acquired, and then the target user can be quickly determined, so that the user information can be quickly acquired from the information base when the authorization response is received, the legality of acquiring the user information can be ensured, and meanwhile, the user report can be accurately acquired from the report base through the report time.
S14, analyzing the user information to obtain a target value.
In at least one embodiment of the present invention, the target value refers to the reputation value of the target user in the processing instruction.
In at least one embodiment of the present invention, the electronic device analyzing the user information to obtain a target value includes:
performing word segmentation processing on the user information to obtain target word segmentation;
matching the target word with a type vocabulary in a type mapping table, wherein the type mapping table stores a plurality of type information and a type vocabulary indicating each type information, and the plurality of type information comprises loan information, annual income and occupational stability;
When any word in the target word is successfully matched with the type word, extracting information with a mapping relation with any word from the information word as an information value of any type information in the plurality of types of information;
acquiring the information weight of the user information;
and carrying out weighted operation on the information value according to the information weight value to obtain the target value.
The target value can be accurately determined by performing quantization processing on the user information.
S15, extracting a target name from the user report, and determining the target type to which the target name belongs.
In at least one embodiment of the present invention, the target name refers to a disease existing in the user report, and the target type refers to a type corresponding to the target name.
In at least one embodiment of the invention, the electronic device extracting the target name from the user report comprises:
identifying reporting information from the user report based on an OCR algorithm;
segmenting the report information according to a preset dictionary to obtain a plurality of segmentation paths and path word segmentation corresponding to each segmentation path;
obtaining word segmentation weights corresponding to the path word segmentation from the preset dictionary, and calculating the sum of the word segmentation weights in each segmentation path to obtain path weights;
Determining the path word corresponding to the segmentation path with the largest path weight as the information word;
traversing the information word segmentation according to a preset library, and determining the information word segmentation matched with any feature in the preset library as the target name.
The preset dictionary is used for storing a plurality of custom word segmentation and weights corresponding to the custom word segmentation.
The preset library stores a plurality of different first-type diseases and second-type diseases, wherein the first-type diseases are diseases corresponding to light symptoms, and the second-type diseases are diseases corresponding to heavy symptoms. Any one of the features refers to any one of the diseases.
The report information can be accurately identified from the user report through an OCR algorithm, so that the report information is segmented according to the preset dictionary, the information word segmentation can be accurately determined according to the weights in the preset dictionary, and the target name can be accurately screened from the information word segmentation according to the preset database.
In at least one embodiment of the present invention, the object type includes a first type and a second type, and the determining, by the electronic device, the object type to which the object name belongs includes:
Acquiring a first name corresponding to the first type and acquiring a second name corresponding to the second type;
if the first name does not contain the target name and the second name does not contain the target name, acquiring a first vector of the first name and acquiring a second vector of the second name;
determining a representation mode of the first vector, and carrying out vectorization processing on the target name according to the representation mode to obtain a target vector;
calculating a first similarity of the target vector and the first vector, and calculating a second similarity of the target vector and the second vector;
if the first similarity is greater than or equal to the second similarity, determining the target type as the first type; or alternatively
And if the first similarity is smaller than the second similarity, determining the target type as the second type.
Wherein the first type refers to a light symptom type, and the second type refers to a heavy symptom type.
The first name refers to the name of the mild disease and the second name refers to the name of the severe disease.
The characterization mode is a pointing quantity mapping table.
By the above embodiment, when the first name does not include the target name and the second name does not include the target name, the type to which the target name belongs can be accurately determined.
In at least one embodiment of the invention, the method further comprises:
if the first name contains the target name, determining the target type as the first type; or alternatively
And if the second name contains the target name, determining the target type as the second type.
And the type of the target name can be rapidly determined through the first name and the second name.
S16, if the target value is a preset value and the target type is a preset type, extracting a target factor from the user report.
In at least one embodiment of the present invention, the preset value includes a reputation value corresponding to a high-quality reputation and a reputation value corresponding to a preferred reputation, and the preset type includes a first type.
The target factor refers to a factor that causes the target name to be suffered by the target user.
In at least one embodiment of the invention, the electronic device extracting target factors from the user report comprises:
extracting information corresponding to a preset label from the information word segmentation as the reporting factor, wherein the preset label is used for indicating the factor;
and carrying out error correction processing on the report factors to obtain the target factors.
Through the preset label, the report factors can be accurately extracted from the information word segmentation, and further through error correction processing of the report factors, the target factors which are uniformly expressed can be obtained.
S17, determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
It is emphasized that the target suggestions may also be stored in nodes of a blockchain in order to further guarantee privacy and security of the target suggestions.
In at least one embodiment of the present invention, the target advice is advice for the processing instructions, and the target advice may include "apply for insurance", "refusal to protect", "pay and pay amount", and the like.
The target path refers to a branch in the information analysis decision tree.
According to the embodiment, the target path can be accurately determined through the target name and the target factors, so that the target suggestion can be rapidly and accurately obtained through the target path.
In at least one embodiment of the present invention, after the target suggestion of the processing instruction is obtained, the method further includes:
Acquiring an instruction number of the processing instruction;
generating prompt information according to the instruction number and the target suggestion;
encrypting the prompt information by adopting a symmetric encryption technology to obtain a ciphertext;
acquiring a trigger user from the processing instruction;
and sending the ciphertext to the binding equipment of the triggering user.
Through the implementation mode, the prompt information can be generated rapidly, the prompt information is encrypted, the safety of the prompt information can be improved, meanwhile, the trigger user is acquired from the processing instruction, and the ciphertext can be accurately sent to corresponding equipment.
According to the technical scheme, the information gain of the second historical information can be accurately determined through the historical training data, the information analysis decision tree can be accurately generated according to the first historical information, the information gain and the second historical information, the user information and the target name extracted from the user report can be analyzed to preliminarily determine the nuclear protection suggestion of the processing instruction, and when the target value is a preset value and the target type is a preset type, the target suggestion can be accurately generated through the target factors and the target names extracted from the user report. In addition, after the underwriting suggestion is preliminarily determined, the target factors and the target names are further analyzed, so that the generation efficiency of the target suggestion can be improved.
FIG. 2 is a functional block diagram of a preferred embodiment of the data processing apparatus of the present invention. The data processing apparatus 11 includes an acquisition unit 110, a calculation unit 111, a generation unit 112, an analysis unit 113, a determination unit 114, an extraction unit 115, an input unit 116, a generation unit 117, an encryption unit 118, and a transmission unit 119. The module/unit referred to herein is a series of computer readable instructions capable of being retrieved by the processor 13 and performing a fixed function and stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
The acquisition unit 110 acquires history training data including first history information, second history information, and processing information.
In at least one embodiment of the present invention, the first history information refers to a disease acquired by the user under the history training data sample, for example, the first history information may be a disease name. The second history information refers to a factor that causes the first history information to appear, for example, the second history information may be a cause of the disease. The processing information refers to the underwriting advice of the underwriting personnel to the user.
Further, the processing information may include: applying, refusing, charging, etc.
The calculation unit 111 calculates an information gain of the second history information from the history training data.
In at least one embodiment of the present invention, the information gain refers to a probability in the historical training data that results in the generation of the first historical information.
In at least one embodiment of the present invention, the calculating unit 111 calculates the information gain of the second history information according to the history training data includes:
for any information in the second historical information, determining data containing any information in the historical training data as sample data, and screening a plurality of types of samples and positive samples and negative samples of each type of samples from the sample data;
calculating the total sample amount of each type of sample, and calculating the first number of each positive sample and the second number of each negative sample;
determining a first information entropy corresponding to each type of sample according to the total sample amount, the first quantity and the second quantity, wherein the first information entropy comprises:wherein E is the first messageEntropy, n total is the total sample amount, n1 is the first amount, and n2 is the second amount;
Calculating the training total amount of the historical training data, and calculating the third number of positive samples in the historical training data and the fourth number of negative samples in the historical training data;
determining a second information entropy corresponding to the historical training data according to the training total amount, the third amount and the fourth amount;
determining the information gain of any information according to the second information entropy, the sample total amount, the training total amount and the first information entropy, including:wherein G is the information gain, E 2 For the second information entropy, j is the number of types of the plurality of types of samples, m is the training total amount, E i A first entropy of information for the i-th type of sample.
For example, the second history information includes A, B and C, for the second history information A (smoking), the second history information A includes three cases { daily smoking, weekly smoking frequency is greater than 4 days, weekly smoking frequency is less than 2 days }, the data including the second history information A obtained from the history training data is determined as sample data, the sample data includes { sample 1, sample 2, …, sample 10}, the training total amount is 10, the third number is 5, the fourth number is 5, the type sample is selected from the sample data, "daily smoking" includes { sample 1, sample 2}, the positive sample of the type is empty, the negative sample is { sample 1, sample 2}, the type sample "weekly smoking frequency is greater than 4 days" { sample 3, sample 4, sample 5, sample 6, sample 7}, positive samples of this type are { sample 3, sample 4, sample 5}, negative samples are { sample 6, sample 7}, type samples "smoking frequency per week is less than 2 days," sample 8, sample 9, sample 10}, positive samples of this type are { sample 8, sample 9 }, negative samples are { sample 10}, calculated for a total of samples of type samples "smoking per day" of 2, a first number of 0, a second number of 2, a total of samples of type samples "smoking frequency per week is greater than 4 days" of 5, a first number of 3, a second number of 2, a total of samples of type samples "smoking frequency per week is less than 2 days" of 3, a first number of 2, a second number of 1, a first information entropy of type samples "smoking per day" of 0, a first information of type samples "smoking frequency per week is 0.97096, the type sample "smoking frequency per week is less than 2 days" has a first information entropy of 0.914961 and the second information entropy is 1, and thus the information gain of the second history information a (smoking) is 0.24.
By analyzing the historical training data, the information gain of any information can be accurately generated.
The generating unit 112 generates an information analysis decision tree using the first history information as a root node and the target information selected from the second history information according to the information gain as a child node.
In at least one embodiment of the present invention, the information analysis decision tree includes a root node, a child node, a branch node, and a termination node. Wherein the root node is the first history information. The sub-node and the branch node are respectively the second historical information, and the terminal node is the processing information.
The target information is second history information with the maximum information gain.
In at least one embodiment of the present invention, the generating unit 112 uses the first history information as a root node, and uses target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree, including:
determining the second history information with the maximum information gain as the target information, and determining the second history information except the target information as characteristic information;
Extracting each type of data in the target information from the historical training data to serve as a sample to be tested;
calculating the information gain of each sample to be detected and the characteristic information as the characteristic gain;
determining the characteristic information with the maximum characteristic gain as attribute information;
and constructing the information analysis decision tree by taking the first history information as a root node, the target information as a sub-node of the root node, the attribute information as a branch node of the sub-node and the processing information as a terminal node.
By analyzing the historical training data, the information analysis decision tree can be quickly constructed.
The acquiring unit 110 receives the processing instruction, and acquires the user information and the user report according to the processing instruction.
In at least one embodiment of the present invention, the processing instruction includes a user identification code, an instruction number, a trigger user, and the like.
The user information refers to basic information of the applicant in the processing instruction, and the user report refers to a check report that the applicant is closest to the current.
In at least one embodiment of the present invention, the obtaining unit 110 obtains the user information and the user report according to the processing instruction includes:
Analyzing the message of the processing instruction to obtain data information carried by the message;
acquiring information indicating a user from the data information as a user identification code, and determining a user corresponding to the user identification code as a target user;
acquiring information indicating a tag from the data information as an information tag;
generating an information authorization request according to the information tag, and sending the information authorization request to a user terminal of the target user;
when an authorization response sent by the user terminal based on the information authorization request is received, acquiring an information extraction key from the authorization response;
acquiring information corresponding to the target user and the information tag from an information base simultaneously as the user information based on the information extraction key;
and acquiring information corresponding to the target user from a report library as a preliminary screening report, and selecting the user report with the largest report time as the user report.
Wherein, the information tag refers to a tag indicating basic information of a user, and the basic information includes, but is not limited to: loan conditions, age, annual income, occupation, etc.
The information base stores basic information of a plurality of users.
The report library stores inspection reports of a plurality of users, inspection reports of different periods of each user, and the like.
By analyzing the message, the data information can be quickly acquired, and then the target user can be quickly determined, so that the user information can be quickly acquired from the information base when the authorization response is received, the legality of acquiring the user information can be ensured, and meanwhile, the user report can be accurately acquired from the report base through the report time.
The analysis unit 113 analyzes the user information to obtain a target value.
In at least one embodiment of the invention, the target value refers to the reputation type of the target user in the processing instruction.
In at least one embodiment of the present invention, the analyzing unit 113 analyzes the user information, and the obtaining the target value includes:
performing word segmentation processing on the user information to obtain target word segmentation;
matching the target word with a type vocabulary in a type mapping table, wherein the type mapping table stores a plurality of type information and a type vocabulary indicating each type information, and the plurality of type information comprises loan information, annual income and occupational stability;
When any word in the target word is successfully matched with the type word, extracting information with a mapping relation with any word from the information word as an information value of any type information in the plurality of types of information;
acquiring the information weight of the user information;
and carrying out weighted operation on the information value according to the information weight value to obtain the target value.
The target value can be accurately determined by performing quantization processing on the user information.
The determination unit 114 extracts a target name from the user report and determines a target type to which the target name belongs.
In at least one embodiment of the present invention, the target name refers to a disease existing in the user report, and the target type refers to a type corresponding to the target name.
In at least one embodiment of the present invention, the determining unit 114 extracting a target name from the user report includes:
identifying reporting information from the user report based on an OCR algorithm;
segmenting the report information according to a preset dictionary to obtain a plurality of segmentation paths and path word segmentation corresponding to each segmentation path;
Obtaining word segmentation weights corresponding to the path word segmentation from the preset dictionary, and calculating the sum of the word segmentation weights in each segmentation path to obtain path weights;
determining the path word corresponding to the segmentation path with the largest path weight as the information word;
traversing the information word segmentation according to a preset library, and determining the information word segmentation matched with any feature in the preset library as the target name.
The preset dictionary is used for storing a plurality of custom word segmentation and weights corresponding to the custom word segmentation.
The preset library stores a plurality of different first-type diseases and second-type diseases. The first type of disease refers to a disease corresponding to a light disorder, and the second type of disease refers to a disease corresponding to a heavy disorder. Any one of the features refers to any one of the diseases.
The report information can be accurately identified from the user report through an OCR algorithm, so that the report information is segmented according to the preset dictionary, the information word segmentation can be accurately determined according to the weights in the preset dictionary, and the target name can be accurately screened from the information word segmentation according to the preset database.
In at least one embodiment of the present invention, the object type includes a first type and a second type, and the determining unit 114 determines the object type to which the object name belongs includes:
acquiring a first name corresponding to the first type and acquiring a second name corresponding to the second type;
if the first name does not contain the target name and the second name does not contain the target name, acquiring a first vector of the first name and acquiring a second vector of the second name;
determining a representation mode of the first vector, and carrying out vectorization processing on the target name according to the representation mode to obtain a target vector;
calculating a first similarity of the target vector and the first vector, and calculating a second similarity of the target vector and the second vector;
if the first similarity is greater than or equal to the second similarity, determining the target type as the first type; or alternatively
And if the first similarity is smaller than the second similarity, determining the target type as the second type.
Wherein the first type refers to a light symptom type, and the second type refers to a heavy symptom type.
The first name refers to the name of the mild disease and the second name refers to the name of the severe disease.
The characterization mode is a pointing quantity mapping table.
By the above embodiment, when the first name does not include the target name and the second name does not include the target name, the type to which the target name belongs can be accurately determined.
In at least one embodiment of the present invention, if the target name is included in the mild disease, the determining unit 114 determines the target type as the first type; or alternatively
If the target name is included in the severe disease, the determination unit 114 determines the target type as the second type.
And the type of the target name can be rapidly determined through the first name and the second name.
If the target value is a preset value and the target type is a preset type, the extraction unit 115 extracts a target factor from the user report.
In at least one embodiment of the present invention, the preset value includes a reputation value corresponding to a high-quality reputation and a reputation value corresponding to a preferred reputation, and the preset type includes a first type.
The target factor refers to a factor that causes the target name to be suffered by the target user.
In at least one embodiment of the present invention, the extracting unit 115 extracts target factors from the user report includes:
extracting information corresponding to a preset label from the information word segmentation as the reporting factor, wherein the preset label is used for indicating the factor;
and carrying out error correction processing on the report factors to obtain the target factors.
Through the preset label, the report factors can be accurately extracted from the information word segmentation, and further through error correction processing of the report factors, the target factors which are uniformly expressed can be obtained.
The input unit 116 determines a target path from the information analysis decision tree according to the target name and the target factor, and acquires a destination point from the target path as a target suggestion.
It is emphasized that the target suggestions may also be stored in nodes of a blockchain in order to further guarantee privacy and security of the target suggestions.
In at least one embodiment of the present invention, the target advice is advice for the processing instructions, and the target advice may include "apply for insurance", "refusal to protect", "pay and pay amount", and the like.
The target path refers to a branch in the information analysis decision tree.
According to the embodiment, the target path can be accurately determined through the target name and the target factors, so that the target suggestion can be quickly and accurately obtained after the target path is used.
In at least one embodiment of the present invention, the obtaining unit 110 obtains an instruction number of the processing instruction after obtaining the target suggestion of the processing instruction;
generating unit 117 generates hint information according to the instruction number and the target suggestion;
the encryption unit 118 encrypts the prompt message by using a symmetric encryption technology to obtain a ciphertext;
the acquiring unit 110 acquires a trigger user from the processing instruction;
the transmitting unit 119 transmits the ciphertext to the binding apparatus of the trigger user.
Through the implementation mode, the prompt information can be generated rapidly, the prompt information is encrypted, the safety of the prompt information can be improved, meanwhile, the trigger user is acquired from the processing instruction, and the ciphertext can be accurately sent to corresponding equipment.
According to the technical scheme, the information gain of the second historical information can be accurately determined through the historical training data, the information analysis decision tree can be accurately generated according to the first historical information, the information gain and the second historical information, the user information and the target name extracted from the user report can be analyzed to preliminarily determine the nuclear protection suggestion of the processing instruction, and when the target value is a preset value and the target type is a preset type, the target suggestion can be accurately generated through the target factors and the target names extracted from the user report. In addition, after the underwriting suggestion is preliminarily determined, the target factors and the target names are further analyzed, so that the generation efficiency of the target suggestion can be improved.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a data processing method.
In one embodiment of the invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as data processing programs, stored in the memory 12 and executable on the processor 13.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and may include more or less components than illustrated, or may combine certain components, or different components, e.g. the electronic device 1 may further include input-output devices, network access devices, buses, etc.
The processor 13 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 13 is an operation core and a control center of the electronic device 1, connects various parts of the entire electronic device 1 using various interfaces and lines, and executes an operating system of the electronic device 1 and various installed applications, program codes, etc.
Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of computer readable instructions capable of performing a specific function, the computer readable instructions describing a process of executing the computer readable instructions in the electronic device 1. For example, the computer-readable instructions may be divided into an acquisition unit 110, a calculation unit 111, a generation unit 112, an analysis unit 113, a determination unit 114, an extraction unit 115, an input unit 116, a generation unit 117, an encryption unit 118, and a transmission unit 119.
The memory 12 may be used to store the computer readable instructions and/or modules, and the processor 13 may implement various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. Memory 12 may include non-volatile and volatile memory, such as: a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other storage device.
The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), or the like.
The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may also be implemented by implementing all or part of the processes in the methods of the embodiments described above, by instructing the associated hardware by means of computer readable instructions, which may be stored in a computer readable storage medium, the computer readable instructions, when executed by a processor, implementing the steps of the respective method embodiments described above.
Wherein the computer readable instructions comprise computer readable instruction code which may be in the form of source code, object code, executable files, or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory).
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
In connection with fig. 1, the memory 12 in the electronic device 1 stores computer readable instructions implementing a data processing method, the processor 13 being executable to implement:
acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
calculating the information gain of the second historical information according to the historical training data;
taking the first history information as a root node, and taking target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree;
Receiving a processing instruction, and acquiring user information and a user report according to the processing instruction;
analyzing the user information to obtain a target value;
extracting a target name from the user report, and determining a target type to which the target name belongs;
if the target value is a preset value and the target type is a preset type, extracting a target factor from the user report;
and determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
In particular, the specific implementation method of the processor 13 on the computer readable instructions may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The computer readable storage medium has stored thereon computer readable instructions, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:
Acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
calculating the information gain of the second historical information according to the historical training data;
taking the first history information as a root node, and taking target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree;
receiving a processing instruction, and acquiring user information and a user report according to the processing instruction;
analyzing the user information to obtain a target value;
extracting a target name from the user report, and determining a target type to which the target name belongs;
if the target value is a preset value and the target type is a preset type, extracting a target factor from the user report;
and determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. The units or means may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.
Claims (8)
1. A data processing method, characterized in that the data processing method comprises:
acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
calculating an information gain of the second historical information according to the historical training data, including: for any information in the second historical information, determining data containing any information in the historical training data as sample data, and screening a plurality of types of samples and positive samples and negative samples of each type of samples from the sample data; calculating the total sample amount of each type of sample, and calculating the first number of each positive sample and the second number of each negative sample; determining a first information entropy corresponding to each type of sample according to the total sample amount, the first quantity and the second quantity, wherein the first information entropy comprises: wherein E is the first information entropy, n is the total sample amount, n1 is the first number, and n2 is the second number; calculating the training total amount of the historical training data, and calculating the third number of positive samples in the historical training data and the fourth number of negative samples in the historical training data; determining a second information entropy corresponding to the historical training data according to the training total amount, the third amount and the fourth amount; determining the information gain of any information according to the second information entropy, the sample total amount, the training total amount and the first information entropy, including:
Wherein G is the information gain, E 2 For the second information entropy, j is the number of types of the plurality of types of samples, m is the training total amount, E i A first information entropy for the i-th type sample;
taking the first history information as a root node, and taking target information selected from the second history information according to the information gain as a sub-node to generate an information analysis decision tree;
receiving a processing instruction, and acquiring user information and a user report according to the processing instruction;
analyzing the user information to obtain a target value, including: performing word segmentation processing on the user information to obtain target word segmentation; matching the target word with the type vocabulary in a type mapping table, wherein the type mapping table stores a plurality of type information and type vocabulary indicating each type information; when any word in the target word is successfully matched with the type word, extracting information with mapping relation with any word from the information word as an information value of any type information in the plurality of types of information; acquiring the information weight of the user information; carrying out weighting operation on the information value according to the information weight value to obtain the target value;
Extracting a target name from the user report, and determining a target type to which the target name belongs;
if the target value is a preset value and the target type is a preset type, extracting a target factor from the user report;
and determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
2. The data processing method of claim 1, wherein generating an information analysis decision tree using the first history information as a root node and target information selected from the second history information according to the information gain as a child node comprises:
determining the second history information with the maximum information gain as the target information, and determining the second history information except the target information as characteristic information;
extracting each type of data in the target information from the historical training data to serve as a sample to be tested;
calculating the information gain of each sample to be detected and the characteristic information as the characteristic gain;
determining the characteristic information with the maximum characteristic gain as attribute information;
And constructing the information analysis decision tree by taking the first history information as a root node, the target information as a sub-node of the root node, the attribute information as a branch node of the sub-node and the processing information as a terminal node.
3. The data processing method of claim 1, wherein the obtaining user information and user reports according to the processing instructions comprises:
analyzing the message of the processing instruction to obtain data information carried by the message;
acquiring information indicating a user from the data information as a user identification code, and determining a user corresponding to the user identification code as a target user;
acquiring information indicating a tag from the data information as an information tag;
generating an information authorization request according to the information tag, and sending the information authorization request to a user terminal of the target user;
when an authorization response sent by the user terminal based on the information authorization request is received, acquiring an information extraction key from the authorization response;
acquiring information corresponding to the target user and the information tag from an information base simultaneously as the user information based on the information extraction key;
And acquiring information corresponding to the target user from a report library as a preliminary screening report, and selecting the user report with the largest report time as the user report.
4. The data processing method of claim 1, wherein the extracting a target name from the user report comprises:
identifying reporting information from the user report based on an OCR algorithm;
segmenting the report information according to a preset dictionary to obtain a plurality of segmentation paths and path word segmentation corresponding to each segmentation path;
obtaining word segmentation weights corresponding to the path word segmentation from the preset dictionary, and calculating the sum of the word segmentation weights in each segmentation path to obtain path weights;
determining the path word corresponding to the segmentation path with the largest path weight as the information word;
traversing the information word segmentation according to a preset library, and determining the information word segmentation matched with any feature in the preset library as the target name.
5. The data processing method of claim 1, wherein the target type comprises a first type and a second type, and wherein determining the target type to which the target name belongs comprises:
Acquiring a first name corresponding to the first type and acquiring a second name corresponding to the second type;
if the first name does not contain the target name and the second name does not contain the target name, acquiring a first vector of the first name and acquiring a second vector of the second name;
determining a representation mode of the first vector, and carrying out vectorization processing on the target name according to the representation mode to obtain a target vector;
calculating a first similarity of the target vector and the first vector, and calculating a second similarity of the target vector and the second vector;
if the first similarity is greater than or equal to the second similarity, determining the target type as the first type; or alternatively
And if the first similarity is smaller than the second similarity, determining the target type as the second type.
6. A data processing apparatus, characterized in that the data processing apparatus comprises:
the acquisition unit is used for acquiring historical training data, wherein the historical training data comprises first historical information, second historical information and processing information;
a calculation unit for calculating an information gain of the second history information according to the history training data, including: for any information in the second historical information, determining data containing any information in the historical training data as sample data, and screening a plurality of types of samples and positive samples and negative samples of each type of samples from the sample data; calculating the total sample amount of each type of sample, and calculating the first number of each positive sample and the second number of each negative sample; determining a first information entropy corresponding to each type of sample according to the total sample amount, the first quantity and the second quantity, wherein the first information entropy comprises: Wherein E is the first information entropy, n is the total sample amount, n1 is the first number, and n2 is the second number; calculating the training total amount of the historical training data, and calculating the third number of positive samples in the historical training data and the fourth number of negative samples in the historical training data; determining a second information entropy corresponding to the historical training data according to the training total amount, the third amount and the fourth amount; determining the information gain of any information according to the second information entropy, the sample total amount, the training total amount and the first information entropy, including:
wherein G is the information gain, E 2 For the second information entropy, j is the number of types of the plurality of types of samples, m is the training total amount, E i A first information entropy for the i-th type sample;
the generating unit is used for taking the first history information as a root node and generating an information analysis decision tree by taking target information selected from the second history information according to the information gain as a sub-node;
the acquisition unit is also used for receiving the processing instruction and acquiring the user information and the user report according to the processing instruction;
An analysis unit, configured to analyze the user information to obtain a target value, including: performing word segmentation processing on the user information to obtain target word segmentation; matching the target word with the type vocabulary in a type mapping table, wherein the type mapping table stores a plurality of type information and type vocabulary indicating each type information; when any word in the target word is successfully matched with the type word, extracting information with mapping relation with any word from the information word as an information value of any type information in the plurality of types of information; acquiring the information weight of the user information; carrying out weighting operation on the information value according to the information weight value to obtain the target value;
a determining unit, configured to extract a target name from the user report, and determine a target type to which the target name belongs;
the extraction unit is used for extracting target factors from the user report if the target value is a preset value and the target type is a preset type;
and the input unit is used for determining a target path from the information analysis decision tree according to the target name and the target factor, and acquiring a terminal point from the target path as a target suggestion.
7. An electronic device, the electronic device comprising:
a memory storing computer readable instructions; and
A processor executing computer readable instructions stored in the memory to implement the data processing method of any one of claims 1 to 5.
8. A computer-readable storage medium, characterized by: stored in the computer readable storage medium are computer readable instructions that are executed by a processor in an electronic device to implement the data processing method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110704361.4A CN113269179B (en) | 2021-06-24 | 2021-06-24 | Data processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110704361.4A CN113269179B (en) | 2021-06-24 | 2021-06-24 | Data processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113269179A CN113269179A (en) | 2021-08-17 |
CN113269179B true CN113269179B (en) | 2024-04-05 |
Family
ID=77235795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110704361.4A Active CN113269179B (en) | 2021-06-24 | 2021-06-24 | Data processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113269179B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113722371B (en) * | 2021-08-31 | 2024-04-12 | 深圳平安智慧医健科技有限公司 | Medicine recommendation method, device, equipment and storage medium based on decision tree |
CN114912870A (en) * | 2022-05-10 | 2022-08-16 | 深圳壹账通智能科技有限公司 | Intelligent logistics scheduling method, device and equipment based on decision tree and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255013A (en) * | 2018-08-14 | 2019-01-22 | 平安医疗健康管理股份有限公司 | Claims Resolution decision-making technique, device, computer equipment and storage medium |
CN111405081A (en) * | 2020-03-13 | 2020-07-10 | 北京奇艺世纪科技有限公司 | DNS (Domain name System) adjusting method and device based on decision tree, computer equipment and storage medium |
CN111581296A (en) * | 2020-04-02 | 2020-08-25 | 深圳壹账通智能科技有限公司 | Data correlation analysis method and device, computer system and readable storage medium |
CN111639487A (en) * | 2020-04-30 | 2020-09-08 | 深圳壹账通智能科技有限公司 | Classification model-based field extraction method and device, electronic equipment and medium |
WO2021115133A1 (en) * | 2020-09-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Driving-behavior recognition method, apparatus, electronic device, and storage medium |
-
2021
- 2021-06-24 CN CN202110704361.4A patent/CN113269179B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255013A (en) * | 2018-08-14 | 2019-01-22 | 平安医疗健康管理股份有限公司 | Claims Resolution decision-making technique, device, computer equipment and storage medium |
CN111405081A (en) * | 2020-03-13 | 2020-07-10 | 北京奇艺世纪科技有限公司 | DNS (Domain name System) adjusting method and device based on decision tree, computer equipment and storage medium |
CN111581296A (en) * | 2020-04-02 | 2020-08-25 | 深圳壹账通智能科技有限公司 | Data correlation analysis method and device, computer system and readable storage medium |
CN111639487A (en) * | 2020-04-30 | 2020-09-08 | 深圳壹账通智能科技有限公司 | Classification model-based field extraction method and device, electronic equipment and medium |
WO2021115133A1 (en) * | 2020-09-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Driving-behavior recognition method, apparatus, electronic device, and storage medium |
Non-Patent Citations (1)
Title |
---|
ID3算法在程序设计类课程成绩分析中的应用研究;刘敏娜;;电子设计工程(09);第48-50+53页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113269179A (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111695033B (en) | Enterprise public opinion analysis method, enterprise public opinion analysis device, electronic equipment and medium | |
CN112669138B (en) | Data processing method and related equipment | |
CN111681091B (en) | Financial risk prediction method and device based on time domain information and storage medium | |
CN113283675B (en) | Index data analysis method, device, equipment and storage medium | |
CN113269179B (en) | Data processing method, device, equipment and storage medium | |
CN111639487A (en) | Classification model-based field extraction method and device, electronic equipment and medium | |
CN113656547B (en) | Text matching method, device, equipment and storage medium | |
CN113268597B (en) | Text classification method, device, equipment and storage medium | |
CN113032528A (en) | Case analysis method, case analysis device, case analysis equipment and storage medium | |
CN114186275A (en) | Privacy protection method and device, computer equipment and storage medium | |
CN112668453B (en) | Video identification method and related equipment | |
CN117520503A (en) | Financial customer service dialogue generation method, device, equipment and medium based on LLM model | |
CN113435196A (en) | Intention recognition method, device, equipment and storage medium | |
CN113470775B (en) | Information acquisition method, device, equipment and storage medium | |
CN114860742A (en) | Artificial intelligence-based AI customer service interaction method, device, equipment and medium | |
CN113420143B (en) | Method, device, equipment and storage medium for generating document abstract | |
CN111933241B (en) | Medical data analysis method, device, electronic equipment and storage medium | |
CN113705468A (en) | Digital image identification method based on artificial intelligence and related equipment | |
CN113065947A (en) | Data processing method, device, equipment and storage medium | |
CN116629423A (en) | User behavior prediction method, device, equipment and storage medium | |
CN114581177B (en) | Product recommendation method, device, equipment and storage medium | |
CN113240325B (en) | Data processing method, device, equipment and storage medium | |
CN113326365B (en) | Reply sentence generation method, device, equipment and storage medium | |
CN113849618B (en) | Strategy determination method and device based on knowledge graph, electronic equipment and medium | |
CN115577983A (en) | Enterprise task matching method based on block chain, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |