CN112182253B - Data processing method, data processing equipment and computer readable storage medium - Google Patents

Data processing method, data processing equipment and computer readable storage medium Download PDF

Info

Publication number
CN112182253B
CN112182253B CN202011352610.XA CN202011352610A CN112182253B CN 112182253 B CN112182253 B CN 112182253B CN 202011352610 A CN202011352610 A CN 202011352610A CN 112182253 B CN112182253 B CN 112182253B
Authority
CN
China
Prior art keywords
standard
user attribute
network
training
directed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011352610.XA
Other languages
Chinese (zh)
Other versions
CN112182253A (en
Inventor
徐超
刘亚飞
张子恒
刘博�
陈曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011352610.XA priority Critical patent/CN112182253B/en
Publication of CN112182253A publication Critical patent/CN112182253A/en
Application granted granted Critical
Publication of CN112182253B publication Critical patent/CN112182253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses a data processing method, data processing equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a first user attribute sample text, and acquiring standard user attribute entity words for representing the first user attribute sample text; acquiring standard result entity words used for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes; acquiring a second user attribute sample text and a corresponding business analysis result label; determining the directional conditional probability between network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directional conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text. By the method and the device, the accuracy of the service analysis result can be ensured in service data analysis.

Description

Data processing method, data processing equipment and computer readable storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data processing method, device, and computer-readable storage medium.
Background
With the rapid development of artificial intelligence, the traditional artificial data analysis is gradually replaced by the intelligent data analysis, such as the business companies like loan companies, security companies, insurance companies, etc., and the automatic data analysis is implemented by using the artificial intelligence.
In the existing automatic data analysis method, generally, a service person is required to preset a service analysis index corresponding to each service analysis result, so that a mapping relation between the analysis result and the analysis index can be obtained, and the mapping relation is configured in service analysis equipment. When the business analysis result of a certain user needs to be inquired, the relevant index of the user can be input into the business analysis equipment, and the business analysis equipment can automatically generate the business analysis result aiming at the user according to the configured mapping relation.
It can be seen that the existing automated data analysis method is very dependent on the mapping relationship created by the service personnel, so that if the service experience of the service personnel is insufficient, the created mapping relationship is likely to be inaccurate, and further, the service analysis result output by the service analysis equipment is not accurate enough.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, data processing equipment and a computer readable storage medium, which can ensure the accuracy of a business analysis result in business data analysis.
An embodiment of the present application provides a data processing method, including:
acquiring a first user attribute sample text, and acquiring standard user attribute entity words for representing the first user attribute sample text;
acquiring standard result entity words used for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes;
acquiring a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word;
determining the directional conditional probability between network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directional conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
An embodiment of the present application provides a data processing method, including:
acquiring a user attribute text, and acquiring a standby standard user attribute entity word for representing the user attribute text;
acquiring a standard analysis graph network; the standard analysis graph network comprises network nodes and directional conditional probabilities among the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise standby standard user attribute entity words; the standard fruiting body word is used for representing a service analysis result;
and determining a service analysis reference result of the user attribute text according to the standby standard user attribute entity words and the standard analysis graph network.
An embodiment of the present application provides a data processing apparatus, including:
the first obtaining module is used for obtaining a first user attribute sample text and obtaining a standard user attribute entity word for representing the first user attribute sample text;
the second acquisition module is used for acquiring standard result entity words used for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes;
the third obtaining module is used for obtaining a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word;
the probability determining module is used for determining the directional conditional probability between network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directional conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
Wherein the initial analysis graph network comprises directed conditional edges between network nodes;
a determine probability module comprising:
the first obtaining unit is used for obtaining a directed decision path containing the second user attribute sample text and the business analysis result label according to the incidence relation between the second user attribute sample text and the business analysis result label;
the second acquisition unit is used for acquiring the directed conditional edge indicated by the directed decision path in the initial analysis graph network as a training directed conditional edge;
and the first determining unit is used for determining the directional conditional probability corresponding to the training directional conditional edge according to the directional decision path.
The standard user attribute entity words comprise standard object entity words and standard index entity words; the second user attribute sample text comprises an object sample text belonging to the standard object entity words and an index sample text belonging to the standard index entity words;
a second acquisition unit comprising:
the first determining subunit is used for determining a network node corresponding to the target sample text as a first training network node, determining a network node corresponding to the index sample text as a second training network node, and determining a network node corresponding to the service analysis result label as a third training network node in the initial analysis graph network;
and the second determining subunit is used for determining training directed condition edges in directed condition edges among the first training network node, the second training network node and the third training network node according to the directed decision path.
Wherein the training directed conditional edges comprise a first training directed conditional edge;
a first determination unit comprising:
the first generation subunit is used for generating a first probability that the first training network node points to the second training network node according to the incidence relation between the object sample text and the index sample text in the directed decision path;
a third determining subunit, configured to determine the first probability as a directional conditional probability corresponding to the first training directional conditional edge; the first training directed conditional edge refers to a directed conditional edge pointing from the first training network node to the second training network node.
Wherein the training directed conditional edges comprise a second training directed conditional edge;
a first determination unit comprising:
the second generation subunit is used for generating a second probability that the second training network node points to the third training network node according to the incidence relation between the index sample text in the directed decision path and the service analysis result label;
the fourth determining subunit is configured to determine the second probability as a directional conditional probability corresponding to the second training directional conditional edge; and the second training directed conditional edge refers to a directed conditional edge pointed to the third training network node by the second training network node.
Wherein the training directed conditional edges include a third training directed conditional edge; the directed decision path comprises at least two index sample texts;
a first determination unit comprising:
the third generation subunit is used for generating a second probability between at least two second training network nodes according to the incidence relation between at least two index sample texts in the directed decision path;
a fifth determining subunit, configured to determine the second probability as a directional conditional probability corresponding to the third training directional conditional edge; the at least two second training network nodes comprise network nodes corresponding to the at least two index sample texts respectively; and the third training directed conditional edge is obtained by connecting at least two second training network nodes according to the direction sequence between at least two index sample texts contained in the directed decision path.
The number of the object sample texts is at least two, and the number of the index sample texts is at least two; the at least two object sample texts comprise target object sample texts, and the at least two index sample texts comprise target index sample texts;
the first generation subunit is specifically configured to determine, according to the directional decision path, the number of index sample texts pointed by the target object sample text as a first number;
the first generation subunit is further specifically configured to determine, according to the directional decision path, the number of target index sample texts pointed by the target object sample texts as a second number;
the first generating subunit is further specifically configured to determine, according to the first number and the second number, a first probability that the first training network node points to the second training network node.
The standard user attribute entity words comprise standard object entity words and standard index entity words;
a second acquisition module comprising:
the second determining unit is used for determining the standard object entity words, the standard index entity words and the standard result entity words as network nodes;
the first generation unit is used for generating a network object layer according to the network nodes corresponding to the standard object entity words, generating a network index layer according to the network nodes corresponding to the standard index entity words, and generating a network result layer according to the network nodes corresponding to the standard result entity words;
the first connection unit is used for respectively connecting each network node in the network object layer with each network node in the network index layer to obtain a first directed edge;
the second connection unit is used for respectively connecting each network node in the network index layer with each network node in the network result layer to obtain a second directed edge;
a third determining unit, configured to determine the first directed edge and the second directed edge as directed conditional edges;
and the second generation unit is used for constructing an initial analysis graph network according to the network nodes and the directed conditional edges.
Wherein, first acquisition module includes:
the first input unit is used for inputting the first user attribute sample text into a text recognition model and acquiring original user attribute entity words for representing the first user attribute sample text based on the text recognition model;
and the second input unit is used for inputting the original user attribute entity words into the entity word standardization model and carrying out standardization processing on the original user attribute entity words based on the entity word standardization model to obtain the standard user attribute entity words.
The text recognition model comprises an input layer, a coding layer, a hiding layer and a recognition layer;
a first input unit comprising:
the first processing subunit is used for carrying out segmentation processing on the first user attribute sample text based on the input layer to obtain at least two word segments;
the second processing subunit is used for inputting the at least two participles into the coding layer and respectively carrying out coding processing on the at least two participles based on the coding layer to obtain at least two semantic vectors;
the third processing subunit is used for inputting the at least two semantic vectors into the hidden layer, and respectively performing hidden feature extraction processing on the at least two semantic vectors based on the hidden layer to obtain at least two hidden vectors;
and the fourth processing subunit is used for inputting the at least two hidden vectors into the recognition layer, and performing recognition processing on the at least two hidden vectors based on the recognition layer to obtain an original user attribute entity word for representing the first user attribute sample text.
Wherein, the second input unit includes:
the first obtaining subunit is used for obtaining the standard sample entity words;
the sixth determining subunit is configured to determine, based on the entity word standardization model, an editing distance between the standard sample entity word and the original user attribute entity word;
and the second obtaining subunit is configured to obtain a minimum editing distance from the editing distances, and determine the standard sample entity word corresponding to the minimum editing distance as the standard user attribute entity word of the original user attribute entity word.
An embodiment of the present application provides a data processing apparatus, including:
the first acquisition module is used for acquiring the user attribute text and acquiring standby standard user attribute entity words for representing the user attribute text;
the second acquisition module is used for acquiring a standard analysis chart network; the standard analysis graph network comprises network nodes and directional conditional probabilities among the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise standby standard user attribute entity words; the standard fruiting body word is used for representing a service analysis result;
and the result determining module is used for determining the service analysis reference result of the user attribute text according to the standby standard user attribute entity words and the standard analysis graph network.
The standby standard user attribute entity words comprise standby standard object entity words and standby standard index entity words;
a determine results module comprising:
the first acquisition unit is used for acquiring standard result entity words in the standard analysis chart network;
the path construction unit is used for constructing N standby directed decision paths aiming at the user attribute text according to the standby standard object entity words, the standby standard index entity words and the standard result entity words; wherein, a standby directed decision path comprises a standby standard object entity word, at least one standby standard index entity word and a standard result entity word; n is a positive integer;
the second obtaining unit is used for respectively obtaining the standby path probabilities of the N standby directed decision paths according to the directed conditional probabilities;
the first determining unit is used for determining the maximum standby path probability in the standby path probabilities as a target path probability and determining the standby directed decision path corresponding to the target path probability as a target directed decision path;
and the second determining unit is used for determining the standard result entity words in the target directed decision path as the target standard result entity words and determining the service analysis reference result according to the target standard result entity words.
Wherein, the result determining module further comprises:
a third determining unit, configured to determine a standby standard object entity word in the target directed decision path as a target standard object entity word, and determine a standby standard indicator entity word in the target directed decision path as a target standard indicator entity word;
and the text output unit is used for outputting the service analysis reference text of the user attribute text according to the pointing sequence, the target standard object entity words, the target standard index entity words and the target standard result entity words in the target directed decision path.
One aspect of the present application provides a computer device, comprising: a processor, a memory, a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is configured to provide a data communication function, the memory is configured to store a computer program, and the processor is configured to call the computer program to perform the method according to the aspect of the embodiment of the present application.
An aspect of the present invention provides a computer-readable storage medium, in which a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, perform the method in the above aspect of the present invention.
An aspect of an embodiment of the present application provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method in the above aspect in the embodiments of the present application.
According to the embodiment of the application, the standard user attribute entity words used for representing the first user attribute sample text can be obtained by obtaining the first user attribute sample text; further, a service analysis result associated with the first user attribute sample text is obtained, then a standard result entity word used for representing the service analysis result is obtained, the standard user attribute entity word and the standard result entity word can be used as network nodes, and an initial analysis graph network can be constructed according to the network nodes; further, in order to obtain a standard analysis graph network based on the initial analysis graph network, a sample text for training the initial analysis graph network may be obtained, where the sample text includes a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word; further, the directional conditional probability between network nodes in the initial analysis graph network can be determined according to the incidence relation between the second user attribute sample text and the service analysis result label, and then the standard analysis graph network containing the directional conditional probability can be obtained; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text. In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic view of a data processing scenario provided in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a text recognition model provided in an embodiment of the present application;
fig. 5 is a schematic view of a data processing scenario provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 7 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 8 is a schematic view of a data processing scenario provided in an embodiment of the present application;
fig. 9 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a computer device provided by an embodiment of the present application;
fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For ease of understanding, the following brief explanation of partial nouns is first made:
artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
And (4) checking and protecting: insurance underwriting refers to a process in which an insurer analyzes an application for insuring, decides whether to accept the risk of underwriting, and determines an insurance rate under the condition of accepting the risk of underwriting. In the process of underwriting, underwriters can give different rates according to different risk categories of the objects, thereby ensuring the quality of service and the stability of insurance operation. The underwriting is the core business in underwriting business, and the underwriting part is the most critical step for controlling risk and improving quality of insurance assets of insurance companies.
The scheme provided by the embodiment of the application relates to the natural language processing technology of artificial intelligence, deep learning technology and the like, and is specifically explained by the following embodiment.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present disclosure. As shown in fig. 1, the system may include an analysis server 10a, an enterprise server 10b, enterprise servers 10c, …, an enterprise server 10d, and a cluster of user terminals, which may include: a user terminal cluster 100b connected to the enterprise server 10b, user terminal clusters 100c, … connected to the enterprise server 10c, and a user terminal cluster 100d connected to the enterprise server 10 d. It is to be understood that the enterprise server may include one or more enterprise servers, and the number of enterprise servers will not be limited herein; the user terminal cluster may include one or more user terminal clusters, and the number of the user terminal clusters is not limited herein.
The user terminal cluster 100b may include a user terminal 101b, user terminals 102b, …, and a user terminal 103b, the user terminal cluster 100c may include a user terminal 101c, user terminals 102c, …, and a user terminal 103c, and the user terminal cluster 100d may include a user terminal 101d, user terminals 102d, …, and a user terminal 103 d. It is to be understood that the user terminal cluster 100b may include one or more user terminals, the user terminal cluster 100c may include one or more user terminals, and the user terminal cluster 100d may include one or more user terminals, which will not be limited to the number of user terminals.
Communication connection may exist between the user terminal clusters, for example, communication connection exists between the user terminal 101b and the user terminal 102b, communication connection exists between the user terminal 101b and the user terminal 102c, and communication connection exists between the user terminal 101b and the user terminal 103 c. Any user terminal in the user terminal cluster may have a communication connection with the enterprise server, for example, a communication connection exists between the user terminal 101b and the enterprise server 10b, a communication connection exists between the user terminal 101b and the enterprise server 10c, and a communication connection exists between the user terminal 101b and the enterprise server 10 d. And there may also be a communication connection between the enterprise servers, for example, a communication connection between the enterprise server 10b and the enterprise server 10c, and a communication connection between the enterprise server 10b and the enterprise server 10 d.
Any user terminal in the user terminal cluster may have a communication connection with the analysis server 10a, for example, a communication connection between the user terminal 101b and the analysis server 10a, a communication connection between the user terminal 101c and the analysis server 10a, and a communication connection between the user terminal 101d and the analysis server 10 a. Similarly, any enterprise server may have a communication connection with analysis server 10a, for example, enterprise server 10b may have a communication connection with analysis server 10a, enterprise server 10c may have a communication connection with analysis server 10a, and enterprise server 10d may have a communication connection with analysis server 10 a.
It should be understood that the communication connection is not limited to the connection manner, and may be directly or indirectly connected through a wired communication manner, or directly or indirectly connected through a wireless communication manner, or may be connected through other manners, and the application is not limited herein.
The enterprise server 10B in fig. 1 may be a back office corresponding to insurance company B, the enterprise server 10C may be a back office corresponding to insurance company C, and the enterprise server 10D may be a back office corresponding to insurance company D; the user terminal 101B, the user terminals 102B, …, and the user terminal 103B may be terminals corresponding to clients (e.g., insurance applicants) of the insurance company B, the user terminal 101C, the user terminals 102C, …, and the user terminal 103C may be terminals corresponding to clients of the insurance company C, …, and the user terminal 101D, the user terminals 102D, …, and the user terminal 103D may be terminals corresponding to clients of the insurance company D, respectively.
When an enterprise server (which may be the enterprise server 10b, the enterprise server 10c, or the enterprise server 10d, and for convenience of understanding, the enterprise server 10b is described below as an example) acquires an application for insurance that is sent by a client terminal (which may be the user terminal 101b, the user terminal 102b, or the user terminal 103b, and for convenience of understanding, the user terminal 101b is described below as an example), and needs to process the application for insurance, for example, the application for insurance is analyzed and an insurance check conclusion corresponding to the application for insurance is generated, and the enterprise server 10b may send a user attribute text carried in the application for insurance to the analysis server 10 a. After receiving the user attribute text sent by the enterprise server 10b, the analysis server 10a performs text recognition processing and mapping processing on the user attribute text based on the medical text structured model trained in advance, recognizes medical entity words in the unstructured user attribute text and maps the medical entity words to a standard expression, so as to obtain spare standard user attribute entity words capable of representing the user attribute text. The analysis server 10a may obtain a target directed decision path of the user attribute text based on the standby standard user attribute entity word, and predict a service analysis reference text for the user attribute text according to a standard analysis graph network trained in advance based on the target directed decision path, where the service analysis reference text includes a service analysis reference result, and the service analysis reference text may further include a reason for obtaining the service analysis reference result.
Subsequently, the analysis server 10a may send the generated business analysis reference text to the enterprise server 10b, and at the same time, may store the user attribute text and the business analysis reference text in association with each other in a database. When the same user attribute text uploaded by the same applicant is obtained again, the analysis server 10a may directly return the service analysis reference text to the user terminal (which may be the enterprise server 10b, the enterprise server 10c, or the enterprise server 10 d) that sent the user attribute text. The above-mentioned database can be regarded as an electronic file cabinet, where electronic files (the electronic files in this application may refer to user attribute texts and service analysis reference texts) are stored, and the analysis server 10a can perform operations such as adding, querying, updating, deleting, etc. on the user attribute texts and the service analysis reference texts in the files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.
After the enterprise server 10b receives the service analysis reference text sent by the analysis server 10a, the enterprise terminal may obtain the service analysis reference text from the enterprise server 10b, and then the enterprise terminal may display the service analysis reference text on a screen, where the service analysis reference text may include a pointing order in a target directed decision path, a target standard object entity word, a target standard index entity word, and a target standard result entity word (i.e., a standard entity word to which a service analysis reference result belongs). In the later period, the underwriter corresponding to the enterprise server 10b may perform secondary verification on the service analysis reference result based on the target standard object entity word, the target standard index entity word, and the pointing order between the target standard object entity word and the target standard index entity word. It is noted that in the serious disease insurance scenario of the insurance company, the business analysis reference text output by the standard analysis chart network can be auxiliary text, and is provided to the underwriting personnel as analysis reference data.
Optionally, if the trained standard analysis graph network and the trained medical text structured model are locally stored in the enterprise server 10b, the user attribute text can be locally input, the underwriting conclusion (i.e., the business analysis reference result) of the user attribute text can be obtained after the process and the automation processing, and then the subsequent processing is performed according to the business analysis reference result. Since the training of the medical text structured model and the standard analysis graph network involves a large amount of off-line computation, the medical text structured model and the standard analysis graph network local to the enterprise server 10b may be sent to the enterprise server 10b after the analysis server 10a completes training.
It is understood that the methods provided by the embodiments of the present application may be performed by a computer device, including but not limited to the user terminal or the analysis server or the enterprise server mentioned in fig. 1. The analysis server or the enterprise server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The user terminal may be a smart phone, a tablet computer, a notebook computer, a palm computer, a desktop computer, a Mobile Internet Device (MID), a POS (Point Of Sales) device, a smart speaker, a smart watch, and the like, but is not limited thereto. The user terminal and the analysis server or the enterprise server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
It is to be understood that the automated data analysis method proposed herein may be applied to various business data analysis scenarios, for example, a loan company may analyze credit data of a borrower according to the automated data analysis method proposed herein to determine whether to loan to the borrower; for example, a security company may analyze security trade data to determine subsequent security runs, etc., according to automated data analysis methods set forth herein. The following description takes the underwriting business data of insurance companies as an example, and other business data analysis scenarios can be referred to the following description.
Aiming at the problem that the current insurance underwriting consumes great manpower and material resources, the method provides a graph-inferred end-to-end intelligent underwriting prediction method based on the knowledge graph, and helps an insurance company to obtain intelligent prediction of a business analysis reference result (namely an underwriting conclusion) through user attribute texts (such as basic information of physical examination reports, health notices, accompanying and examining reports and the like). In addition, in consideration of the requirement of the insurance industry on model interpretability, the method and the system adopt a strategy with strong interpretability to help the insurance company to carry out secondary verification on the intelligent underwriting prediction result (namely the business analysis reference result), and fully ensure the accuracy of the intelligent underwriting prediction result under the condition of reducing manpower and material resources. The specific implementation process is shown below.
Further, please refer to fig. 2, and fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application. The data processing method may be executed by the user terminal, the analysis server, or the enterprise server described in fig. 1, or may be executed by both the user terminal, the analysis server, and the enterprise server. As shown in fig. 2, the data processing procedure includes the following steps:
step S101, a first user attribute sample text is obtained, and a standard user attribute entity word used for representing the first user attribute sample text is obtained.
Specifically, a first user attribute sample text is input into a text recognition model, and original user attribute entity words used for representing the first user attribute sample text are obtained based on the text recognition model; and inputting the original user attribute entity words into the entity word standardization model, and carrying out standardization processing on the original user attribute entity words based on the entity word standardization model to obtain standard user attribute entity words.
The text recognition model comprises an input layer, a coding layer, a hiding layer and a recognition layer; the specific process of obtaining the original user attribute entity words for characterizing the first user attribute sample text based on the text recognition model may include: segmenting the first user attribute sample text based on the input layer to obtain at least two word segments; inputting at least two participles into a coding layer, and respectively coding the at least two participles based on the coding layer to obtain at least two semantic vectors; inputting at least two semantic vectors into a hidden layer, and respectively carrying out hidden feature extraction processing on the at least two semantic vectors based on the hidden layer to obtain at least two hidden vectors; and inputting the at least two hidden vectors into a recognition layer, and recognizing the at least two hidden vectors based on the recognition layer to obtain original user attribute entity words for representing the first user attribute sample text.
Based on the entity word standardization model, the original user attribute entity words are subjected to standardization processing, and the specific process of obtaining the standard user attribute entity words can include: acquiring a standard sample entity word; determining an editing distance between a standard sample entity word and an original user attribute entity word based on the entity word standardization model; and acquiring the minimum editing distance from the editing distance, and determining the standard sample entity word corresponding to the minimum editing distance as the standard user attribute entity word of the original user attribute entity word.
Referring to fig. 3, fig. 3 is a schematic view of a data processing scenario according to an embodiment of the present disclosure. As shown in fig. 3, the analysis server 30a obtains a first user attribute sample text 30b, it is understood that the first user attribute sample text 30b may include a physical examination sample report, a health sample report, an examination-accompanied sample report, and the like, which represent the user attribute information of the applicant, and the user attribute information may include disease information (i.e., subject), symptom information (i.e., index), and the like of the applicant. The text number of the first user attribute sample text 30b is not limited in the embodiment of the present application, and in practical application, the first user attribute sample text 30b may include at least one user attribute sample text.
The analysis server 30a inputs the first user attribute sample text 30b into a trained medical text structured model 30c, the medical text structured model 30c may include a text recognition model and an entity word standardization model, and the analysis server 30a may recognize the medical entity word from the unstructured first user attribute sample text 30b and map the recognized medical entity word to a canonical standard expression based on the text recognition model and the entity word standardization model, resulting in a canonical user attribute entity word 30d for characterizing the first user attribute sample text 30 b. The text Recognition model may be a model constructed based on Named Entity Recognition (NER), and the Entity word normalization model may be a model constructed based on a medical term normalization system, for example, as described below.
Please refer to fig. 4 and fig. 3 together, and fig. 4 is a schematic structural diagram of a text recognition model according to an embodiment of the present application. As shown in fig. 4, the text recognition model may include an input layer, an encoding layer, a hiding layer, and a recognition layer; the analysis server 30a inputs the first user attribute sample text 30b to the input layer of the text recognition model, assuming that the first user attribute sample text 30b is "patient fever". Based on the input layer, the segmentation processing is performed on the patient fever, and at least two segmented words are obtained, such as 4 segmented words in fig. 4, namely, patient, fever and fever. Inputting the 'patient', 'person', 'hair' and 'fever' into the coding layer, based on the coding layer, obtaining initial vectors corresponding to the 4 participles respectively, as shown in E in FIG. 4Patient suffering from、EA、EHair-like device、EBakingThen, the 4 initial vectors are respectively encoded to obtain semantic vectors corresponding to the 4 participles, i.e. T shown in fig. 4Patient suffering from、TA、THair-like device、TBaking. It should be understood that, in practical applications, the coding layer may be an independent deep neural network or at least one deep convolutional layer (the number of convolutional layers is not limited in this application), for example, the coding layer is a Bidirectional Language model obtained through pre-training, including but not limited to ELMO network (electronic from Language Models), BERT network (Bidirectional Encoder registration from transformations), based on which, in this embodiment, the coding layer needs to be pre-trained based on a dictionary database to generate a deep Bidirectional Language model.
Referring back to FIG. 4, at least two semantic vectors (i.e., T in FIG. 4) are encodedPatient suffering from、TA、THair-like device、TBaking) The input hidden layer, obviously, the hidden layer in the embodiment of the present application includes a forward hidden layer and a backward hidden layer. Respectively to T based on forward hidden layerPatient suffering from、TA、THair-like device、TBakingCarrying out hidden feature extraction processing, and conducting the extracted hidden features forward to obtain forward hidden vectors corresponding to each participle respectively; respectively to T based on backward hidden layerPatient suffering from、TA、THair-like device、TBakingAnd finally, based on the forward hidden features and the backward hidden features, obtaining the hidden vector of each participle. It should be understood that, in practical applications, the hidden layer may be an independent deep neural network or at least one deep convolutional layer (the number of convolutional layers is not limited in the present application), for example, the hidden layer is a bidirectional Long Short-Term Memory (Bi-LSTM) obtained through pre-training, and based on this, in the present embodiment, the hidden layer needs to be pre-trained based on a dictionary database to generate a Bi-LSTM model.
Referring to fig. 4 again, at least two hidden vectors are input into the recognition layer, and recognition processing is performed on the at least two hidden vectors based on the recognition layer, so as to obtain the position of each participle (such as "patient", "person", "hair", "burn" shown in fig. 4) in the first user attribute sample text 30b and the entity type corresponding to the participle, and O corresponding to the two participles of "patient" and "person" in fig. 4 can represent that "patient" and "person" are neither the entity word specified in the text recognition model nor the start position or end position of a specified entity word; "B-dis" may indicate that "hair" is the starting node of the disease entity word and "E-dis" may indicate that "heat" is the ending node of the disease entity word. Based on the position of each word in the first user attribute sample text 30b and the corresponding entity type, an original user attribute entity word that can characterize the first user attribute sample text 30b is obtained. It should be understood that, in practical applications, the recognition layer may be a separate deep neural network or at least one deep convolutional layer (the number of convolutional layers is not limited in the present application), for example, the recognition layer is a Conditional Random Field (CRF) obtained through pre-training, and based on this, in the present embodiment, the recognition layer needs to be pre-trained based on a dictionary database to generate a CRF model.
In the above, the text recognition model is obtained by training on a large-scale medical text corpus, and the trained text recognition model can recognize the following categories of medical entity words: diseases, symptoms, drugs, surgery, examinations, sites, treatments.
In a real scene, the same medical entity word is expressed in different electronic examination reports or health notices in different ways, such as "family history of breast cancer", "family history: breast cancer, familial breast cancer, etc., which are all expressed in different ways to increase the difficulty of subsequent processing, in the embodiment of the present application, the medical text structured model includes a solid word standardized model constructed based on a medical term standardized system. The analysis server 30a obtains the standard sample entity words, and determines the edit distance between the standard sample entity words and the original user attribute entity words based on the entity word standardization model; and acquiring the minimum editing distance from the editing distance, and determining the standard sample entity word corresponding to the minimum editing distance as the standard user attribute entity word of the original user attribute entity word so as to improve the wide applicability.
The medical term normalization system may use different text distance calculation means to determine the text distance between the original user attribute entity words and the standard sample entity words, such as the term normalization distance calculation, the levenstein distance calculation, and the Jaro distance (a string edit distance) calculation, or the text similarity calculation means to determine the text similarity between the original user attribute entity words and the standard sample entity words, such as the SimHash (a text similarity) calculation and the neural network language model. The embodiment of the present application does not limit the manner of determining the distance between the original user attribute entity word and the standard sample entity word, and the following is briefly described by taking a weighted editing distance method based on fusing different information as an example, where the method may include: 1) fusing the editing distance of the synonym dictionary; 2) and fusing the edit distance of the upper and lower words.
For convenience of description, the original user attribute entity word is regarded as an original character string a, the length of the original character string a is m, the standard user attribute entity word is regarded as a standard character string B, the length of the standard character string B is n, and m and n are positive integers. The edit distance between the two is defined as the minimum edit operand to convert the original string a to the standard string B. And editing types including character insertion, deletion and replacement, constructing a (m + 1) × (n + 1) relation matrix D on the assumption that the cost of each type is 1, and calculating each element in the relation matrix D from left to right and from top to bottom based on the idea of dynamic programming.
1) The edit distance calculation for the fused synonym (synnym) is shown in equation (1):
Figure 982588DEST_PATH_IMAGE001
(1)
dSYN(i, j) represents the original string A1,…iAnd a standard string B1,…jBased on the minimum edit operand of the synonym, i is less than or equal to the length m of the original string a, j is less than or equal to the length n of the standard string B, and the minimum edit distance min { S (i, j) } is calculated as shown in equation (2):
Figure 489793DEST_PATH_IMAGE002
(2)
wherein l1,l2The length of the longest character string and the length of the shortest character string in the thesaurus, the cost fS(w,w) Is calculated as shown in equation (3):
Figure 968179DEST_PATH_IMAGE003
(3)
wherein a represents weight of synonym, a generally takes a value of 0.1, and synonym cluster Syn = { w =1, w2, …, wnWhere w1One character string is represented and any two character strings are mutuallySynonym, thesaurus SYN = { Syn }1, Syn2…, where each element is a synonym cluster. w can be regarded as the original string A, wCan be regarded as a standard string B.
2) The calculation of the edit distance fused with the upper and lower position words is shown in formula (4):
Figure 877229DEST_PATH_IMAGE004
(4)
dHYP(i, j) represents the original string A1,…iAnd a standard string B1,…jBased on the minimum edit operand of the upper and lower terms, the minimum edit distance min { H (i, j) } is calculated as shown in equation (5):
Figure 212395DEST_PATH_IMAGE005
(5)
wherein the cost fH(w,w) Is calculated as shown in equation (6):
Figure 269213DEST_PATH_IMAGE006
(6)
wherein b represents the weight of the hypernym, and the value of b is generally 0.13.
In practical application scenarios, synonyms and upper and lower terms do not appear separately, so the present application uses the two methods mentioned above to obtain the final edit distance, and the comprehensive usage mode can be as shown in formula (7):
Figure 613607DEST_PATH_IMAGE007
(7)
wherein d (i, j) represents the original string A1,…iAnd a standard string B1,…jD (i, j) can be determined according to equation (8).
Figure 111584DEST_PATH_IMAGE008
(8)
The minimum edit distance min { D (i, j) } is calculated as shown in equation (9):
Figure 86493DEST_PATH_IMAGE009
(9)
wherein the cost fD(w,w) Is calculated as shown in equation (10):
Figure 37132DEST_PATH_IMAGE010
(10)
the above description is given by taking the editing distance of the fused synonym and the editing distance of the fused contextual word as examples, and in practical application, the medical term standardization system may use different text distance calculation methods.
Taking the International compliance of medical Insurance (ICD) 10 standard as an example, the above mentioned entity words of the standard sample are described, and the standard has a total standard expression of 3 thousands of user attributes. Using the medical term standardization system and the ICD10 standard, for an abnormal disease input text (e.g., the original user attribute entity word described in this application), the highest-scoring user attribute standard expression, i.e., the standard user attribute entity word, is selected from the 3 ten thousand candidates based on the above scoring function.
And S102, acquiring standard result entity words for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes.
Specifically, the standard user attribute entity words include standard object entity words and standard index entity words; determining the standard object entity words, the standard index entity words and the standard result entity words as network nodes; generating a network object layer according to the network nodes corresponding to the standard object entity words, generating a network index layer according to the network nodes corresponding to the standard index entity words, and generating a network result layer according to the network nodes corresponding to the standard result entity words; connecting each network node in the network object layer with each network node in the network index layer respectively to obtain a first directed edge; connecting each network node in the network index layer with each network node in the network result layer respectively to obtain a second directed edge; determining the first directed edge and the second directed edge as directed conditional edges; and constructing an initial analysis graph network according to the network nodes and the directed conditional edges.
In conjunction with step S101 and fig. 3, the trained medical text structured model 30c may identify medical entity words from the unstructured first user attribute sample text 30b and map the medical entity words to the canonical standard expression, so as to obtain a standard user attribute entity word 30d for characterizing the first user attribute sample text 30 b. It is to be understood that the standard user attribute entity words 30d may include standard object entity words and standard target entity words, for example, if the first user attribute sample text 30b is an electronic examination report, then the first user attribute sample text 30b may include disease (i.e., object) and symptoms (which may be considered targets) of the applicant. As shown in fig. 3, the standard object entity words may include standard object entity words 301d, …, and 303d, and the standard index entity words may include standard index entity word 304d, standard index entity words 305d, …, standard index entity word 306d, and standard index entity word 307 d.
Referring to fig. 3 again, the analysis server 30a may obtain the standard result body words used for characterizing the service analysis result and sent by the enterprise server 30f, where the standard result body words may include the standard result body words 301e and … and the standard result body word 303 e. The business analysis result may include underwriting, charging underwriting, except underwriting, postponing, refusing to be guaranteed, and the standard result entity word is a standard expression for the business analysis result. The number of the enterprise servers 30f is not limited in the embodiment of the present application, and may be a server corresponding to one insurance company or a server corresponding to at least one insurance company.
Optionally, the analysis server 30a may obtain the original result entity words used for characterizing the service analysis result from the enterprise server 30f, at this time, the original result entity words may be input into the medical term standardization system (i.e., entity word standardization model) in step S101 to obtain the standard result entity words corresponding to the original result entity words, and the specific implementation process may refer to the entity word standardization model in step S101, perform standardization processing on the original user attribute entity words to obtain descriptions of the standard user attribute entity words, which is not described herein again.
Optionally, the analysis server 30a may obtain the original result body words used for characterizing the business analysis result from a specific underwriting database, or the standard result body words used for characterizing the business analysis result, and the origin of the standard result body words is not limited in the embodiment of the present application.
Assuming that the analysis server 30a extracts 2 standard object entity words, such as the standard object entity word 301d and the standard object entity word 303d in fig. 3, and 4 standard index entity words, such as the standard index entity word 304d, the standard index entity word 305d, the standard index entity word 306d, and the standard index entity word 307d in fig. 3, according to the first user attribute sample text 30b, the analysis server 30a obtains 2 standard result entity words, such as the standard result entity word 301e and the standard result entity word 303e in fig. 3.
Referring back to fig. 3, the analysis server 30a determines the standard object entity word 301d, the standard object entity word 303d, the standard index entity word 304d, the standard index entity word 305d, the standard index entity word 306d, the standard index entity word 307d, the standard result entity word 301e, and the standard result entity word 303e as network nodes. The analysis server 30a generates a network object layer according to the network nodes (such as the network node D1 and the network node D2 shown in fig. 3) corresponding to the standard object entity word 301D and the standard object entity word 303D, respectively; generating a network index layer according to network nodes (such as the network node C1, the network node C2, the network node C3 and the network node C4 shown in fig. 3) respectively corresponding to the standard index entity word 304d, the standard index entity word 305d, the standard index entity word 306 and the standard index entity word 307 d; and generating a network result layer according to the network nodes (such as the network node H1 and the network node H2 shown in FIG. 3) corresponding to the standard result entity word 301e and the standard result entity word 303 e.
The analysis server 30a connects the network node D1 in the network object layer with each network node in the network index layer, respectively, to obtain a first directed edge, for example, connects the network node D1 with the network node C1, to obtain a first directed edge (D1, C1); the network node D2 in the network object layer is connected to each network node in the network index layer to obtain a first directed edge, for example, the network node D2 is connected to the network node C3 to obtain a first directed edge (D2, C3).
Further, the analysis server 30a connects the network node C1 in the network index layer with each network node in the network result layer, respectively, to obtain a second directed edge, for example, connects the network node C1 with the network node H1, to obtain a second directed edge (C1, H1); respectively connecting a network node C2 in the network index layer with each network node in the network result layer to obtain a second directed edge, for example, connecting a network node C2 with a network node H1 to obtain a second directed edge (C2, H1); the connections of other network nodes in the network index layer are as described above, and are not described in detail herein.
The analysis server 30a may determine the first directed edge and the second directed edge as directed conditional edges, e.g., the second directed edge (C1, H1) may indicate that the edge points to the network node H1 for the network node C1, note that the directed conditional edges may not point in the opposite direction. The analysis server 30a constructs an initial analysis graph network 30g from the network nodes and the directed conditional edges.
It should be noted that, as shown in fig. 3, each network node of the network index layer also needs to be connected, because some attribute indexes are often present in the index determination rule before other attribute indexes, for example, gender determination is required in the index determination rule of a breast nodule, and then age determination is required, so that connection between the determination indexes is required, a connection manner between the determination indexes depends on a large amount of currently existing training data (including a second user attribute sample text below), the training data can provide a clear determination manner of a precedence rule, and therefore, a connection edge between each network node of the network index layer is not defined as a directed conditional edge, because the direction between two network index nodes needs to be determined according to the training data.
Step S103, acquiring a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to the standard result entity word.
Specifically, in an actual underwriting business scenario, there is often precedence in the rule judgment (i.e., index), for example, in the following example of the family history of breast cancer in table 1, the underwriting system needs to judge the gender after judging the gender, and the judgment of other medical history may be to judge the gender after judging the age. Therefore, it should be noted that, when there are a plurality of judgment indexes, theoretically there are also a plurality of judgment chains, for example, the underwriting conclusion probability P (H1| C1, C2, D1), and based on the bayesian theorem expansion, the following formula (11) and formula (12) are obtained:
P(H1|C1, C2, D1) = P(D1) * P(C2|D1) * P(C1|C2) * P(H1|C1) (11)
P(H1|C1, C2, D1) = P(D1) * P(C1|D1) * P(C2|C12) * P(H1|C2) (12)
it can be seen that the difference between equation (11) and equation (12) is whether network node C1 or C2 is determined first. It is based on the complexity of the underwriting rule that training data is needed when training the initial analysis graph network, and it can also be understood as explaining the chain, that is, how the underwriting conclusion of the user attribute sample text is obtained by the derivation logic. The structured training data please refer to table 1, where table 1 is an example table of training data provided in an embodiment of the present application, and table 1 includes a second user attribute sample text and a service analysis result tag, where the second user attribute sample text includes an object sample text and an index sample text.
TABLE 1
Object sample text Index sample text Business analysis result label
Family history of breast cancer Male sex Do not comment on
Family history of breast cancer Female, age 45 Severe illness except for
Family history of breast cancer Female, age 40, no history of related diseases, and no abnormality of mammary gland ultrasound Underwriting and protecting
Table 1 illustrates 3 second user attribute sample texts and corresponding service analysis result labels thereof, and obviously, the number of the index sample texts included in the 3 second user attribute sample texts is different, which also proves that the number of network nodes of the network index layer in the underwriting decision path is uncertain for breast cancer diseases, such as breast nodules, if the sex of the applicant is male, the scheme provided by the present application can directly output an "exception" conclusion (1 judgment index), and if the applicant is female, more indexes need to be judged.
It should be noted that, the second user attribute sample text and the service analysis result label are defaulted to belong to a standard medical entity word (including the standard object entity word, the standard index entity word, and the standard result entity word), if the training data is not the standard medical entity word, when the initial analysis graph network is trained, the entity word corresponding to the network node in the initial analysis graph network may be different from the entity word contained in the training data, thereby affecting the training efficiency and the training precision. At this time, the training data may be input into the entity word standardized model in step S101 to obtain a standard medical entity word, and the specific implementation process may be: acquiring a standard sample entity word; inputting the training data and the standard sample entity words into an entity word standardization model, and determining an editing distance between the standard sample entity words and the training data based on the entity word standardization model; and acquiring the minimum editing distance from the editing distances, and determining the standard sample entity words corresponding to the minimum editing distance as the standard user attribute entity words of the training data.
Step S104, determining directed conditional probability among network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directed conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
Specifically, the initial analysis graph network includes directed conditional edges between network nodes; acquiring a directed decision path containing the second user attribute sample text and the business analysis result label according to the incidence relation between the second user attribute sample text and the business analysis result label; obtaining a directed conditional edge indicated by a directed decision path in an initial analysis graph network as a training directed conditional edge; and determining the directional conditional probability corresponding to the training directional conditional edge according to the directional decision path.
The standard user attribute entity words comprise standard object entity words and standard index entity words; obtaining a directed conditional edge indicated by a directed decision path in an initial analysis graph network, wherein a specific process of training the directed conditional edge may include: in the initial analysis graph network, determining a network node corresponding to an object sample text as a first training network node, determining a network node corresponding to an index sample text as a second training network node, and determining a network node corresponding to a service analysis result label as a third training network node; and determining training directed condition edges among the first training network node, the second training network node and the third training network node according to the directed decision path.
Wherein the training directed conditional edges comprise a first training directed conditional edge; according to the directional decision path, a specific process of determining the directional conditional probability corresponding to the training directional conditional edge may include: generating a first probability that the first training network node points to the second training network node according to the incidence relation between the object sample text and the index sample text in the directed decision path; determining the first probability as a directional conditional probability corresponding to the first training directional conditional edge; the first training directed conditional edge refers to a directed conditional edge pointing from the first training network node to the second training network node.
Wherein the training directed conditional edges comprise a second training directed conditional edge; according to the directional decision path, a specific process of determining the directional conditional probability corresponding to the training directional conditional edge may include: generating a second probability that the second training network node points to a third training network node according to the incidence relation between the index sample text in the directed decision path and the service analysis result label; determining the second probability as the directional conditional probability corresponding to the second training directional conditional edge; and the second training directed conditional edge refers to a directed conditional edge pointed to the third training network node by the second training network node.
Wherein the training directed conditional edges include a third training directed conditional edge; the directed decision path comprises at least two index sample texts; according to the directional decision path, a specific process of determining the directional conditional probability corresponding to the training directional conditional edge may include: generating a second probability between at least two second training network nodes according to the incidence relation between at least two index sample texts in the directed decision path; determining the second probability as a directional conditional probability corresponding to the third training directional conditional edge; the at least two second training network nodes comprise network nodes corresponding to the at least two index sample texts respectively; and the third training directed conditional edge is obtained by connecting at least two second training network nodes according to the direction sequence between at least two index sample texts contained in the directed decision path.
Referring to fig. 5, fig. 5 is a schematic view of a data processing scenario according to an embodiment of the present disclosure. As shown in fig. 5, the analysis server 50a obtains training data 50b, and the training data 50b may include a service analysis result tag corresponding to the second user attribute sample text. In the embodiment of the present application, it is assumed that the training data 50b includes 3 pieces of training data, i.e., training data d1, [ c1 ], < h1>, training data d1, [ c2 ], [ c3 ], < h2 ], and training data d1, [ c2 ], [ c3 ], [ c4 ], < h2> shown in fig. 5. Wherein the subject sample text d1 belongs to a standard subject entity word, such as a family history of breast cancer in table 1; the index sample text c1, the index sample text c2, the index sample text c3 and the index sample text c4 all belong to standard index entity words, such as women, age 40, no related medical history, breast ultrasound not found abnormal in table 1; business analysis result tag h1 and business analysis result tag h2 both belong to standard result entity words, such as inedited, underwriting, etc. in table 1.
It should be understood that the training samples in the training data 50b have a sequential pointing order, for example, the training data d1, [ c2 ], [ c3 ], < h2>, may indicate that the applicant has a disease corresponding to the target sample text d1, and in case of meeting the index corresponding to the index sample text c2, the applicant also meets the index corresponding to the index sample text c3, so that the corresponding underwriting conclusion is h 2.
Based on the above, the analysis server 50a obtains the directed decision path 50c including the second user attribute sample text and the service analysis result tag, as shown in fig. 5, for the training data d1, [ c1 ], < h1>, obtain a directed decision path d 1- > c 1- > h 1; aiming at training data d1, [ c2 ], [ c3 ], < h2>, a directed decision path d 1- > c 2- > c 3- > h2 is obtained; for training data d1, [ c2 ], [ c3 ], [ c4 ], < h2>, directed decision paths d1 — > c2 — > c3 — > c4 — > h2 are obtained.
Referring again to fig. 5, the initial analysis graph network 50D includes directed conditional edges, e.g., the first directed edge, between network nodes (D2, C3). Notably, the training data 50b includes the subject sample text D1, and thus, in the initial analysis graph network 50D, the network node D1 corresponding to the subject sample text D1 is determined as the first training network node; the training data 50b includes the index sample text C1, the index sample text C2, the index sample text C3, and the index sample text C4, and thus, in the initial analysis graph network 50d, the network node C1 corresponding to the index sample text C1, the network node C2 corresponding to the index sample text C2, the network node C3 corresponding to the index sample text C3, and the network node C4 corresponding to the index sample text C4 are determined as second training network nodes; the training data 50b includes a traffic analysis result tag H1 and a traffic analysis result tag H2, and thus, in the initial analysis graph network 50d, the network node H1 corresponding to the traffic analysis result tag H1 and the network node H2 corresponding to the traffic analysis result tag H2 are determined as third training network nodes.
Accordingly, the analysis server 50a determines the directed conditional edge between the network node D1 and the network node C1 as a training directed conditional edge, and determines the directed conditional edge between the network node C1 and the network node H1 as a training directed conditional edge according to the directed decision path D1 — > C1 — > H1; the analysis server 50a determines a directed conditional edge between the network node D1 and the network node C2 as a training directed conditional edge, determines a directed conditional edge between the network node C2 and the network node C3 as a training directed conditional edge, and determines a directed conditional edge between the network node C3 and the network node H2 as a training directed conditional edge according to the directed decision path D1- > C2- > C3- > H2; the analysis server 50a determines, according to the directed decision path D1 — > C2 — > C3 — > C4 — > H2, a directed conditional edge between the network node D1 and the network node C2 as a training directed conditional edge, a directed conditional edge between the network node C2 and the network node C3 as a training directed conditional edge, a directed conditional edge between the network node C3 and the network node C4 as a training directed conditional edge, and a directed conditional edge between the network node C4 and the network node H2 as a training directed conditional edge.
For a specific process of determining the directional conditional probability corresponding to the training directional conditional edge according to the directional decision path 50c, please refer to the description in the embodiment corresponding to fig. 6 below.
In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
Further, please refer to fig. 6, where fig. 6 is a schematic flowchart of a data processing method according to an embodiment of the present application. As shown in fig. 6, the data processing procedure may include the following steps S1041 to S1043, and the steps S1041 to S1043 are a specific embodiment of the step S104 in the embodiment corresponding to fig. 2.
Step S1041, determining, as a first quantity, a quantity of index sample texts pointed by the target object sample text according to the directed decision path.
Specifically, the number of the object sample texts is at least two, and the number of the index sample texts is at least two; the at least two object sample texts include a target object sample text, and the at least two index sample texts include a target index sample text.
Each one of which isThe probability obtained by the result of the underwriting (i.e. the reference result of the service analysis) is P (H)s|Cg, Dk) Wherein the network node H in the network result layersAnd business analysis result label hsRepresenting the same standard result entity word, network node C in the network index layergAnd index sample text cgNetwork node D in network object layer representing same standard index entity wordkAnd object sample text dkRepresenting the same standard object entity word. In other words, P (H)s|Cg, Dk) Indicating that the applicant has a network node DkAssociated disease condition, and satisfies the relation with network node CgWhen the associated index is associated, the result of the underwriting is the network node HsProbability of associated outcome. The embodiment of the application develops the probability based on Bayesian theorem to obtain a derivation formula (13):
P(Hs|Cg, Dk)= P(Dk) * P(Cg |Dk) * P(Hs| Cg) (13)
and s, g and k are positive integers, s is less than or equal to the number of network nodes in the network result layer, g is less than or equal to the number of network nodes in the network index layer, and k is less than or equal to the number of network nodes in the network object layer.
Where P (X | Y) represents the probability of X occurring in the event that Y occurs, e.g. X equals CgY is equal to DkP (X | Y) indicates that the applicant has a network node DkAlso in the case of an associated disease with network node CgProbability of the associated index.
Referring again to fig. 5, after obtaining the training data 50b, the analysis server 50a may begin training the initial analysis graph network 50 d. As illustrated in fig. 5, when training data d1, [ c2 ], [ c3 ], < h2> is taken (note the precedence order), the analysis server 50a may obtain the following formula (14):
P(H2|C3, C2, D1) = P(D1) * P(C2|D1) * P(C3|C2) * P(H2|C3) (14)
wherein, P (H2| C3, C2, D1) may represent the probability of obtaining the traffic analysis result corresponding to the network node H2 when the applicant conforms to the object sample text D1 corresponding to the network node D1, the index sample text C2 corresponding to the network node C2, and the index sample text C3 corresponding to the network node C3.
During the training process, the system may set P (D1) equal to 1, and it is understood that in practical applications, if the applicant provides user attribute text that does not include a disease associated with the standard object entity word, the system will not predict the underwriting conclusion of the disease, so the probability of the disease (i.e., object sample text) may be defaulted to 1. Therefore, the analysis server 50a needs to calculate P (C2| D1) × P (C3| C2) × P (H2| C3) according to the training data 50b, and the embodiment corresponding to fig. 6 is described by taking the example of determining the first probability that the first training network node points to the second training network node.
First, the number of target sample texts pointed to by the target object sample texts is determined, where the number of target sample texts pointed to by the target object sample texts can be determined according to the directional decision path 50c, or the number of target sample texts pointed to by the target object sample texts can be determined according to the training data 50b, which is described as an example of determining the number of target sample texts pointed to by the target object sample texts according to the training data 50 b. Obviously, for the training data d1, [ c2 ], [ c3 ], < h2>, the target object sample text refers to the object sample text d1, and according to the training data 50b, 3 training data each include the object sample text d1 and point to the index sample text c1, the index sample text c2, and the index sample text c2, respectively, so the first number is 3.
For the training data d1, [ c2 ], [ c3 ], < h2>, the target index sample text includes the index sample text c2 and the index sample text c3, the number of index sample texts pointed by the index sample text c2 is first determined, and as can be known from the training data 50b, the training data d1, [ c2 ], [ c3 ], < h2> and the training data d1, [ c2 ], [ c3 ], [ c4 ], < h2> includes the index sample text c2, and all point to the index sample text c3, so the number of index sample texts pointed by the index sample text c2 is 2. Then, the number of the index sample texts pointed by the index sample text c3 is determined, and according to the training data 50b, the training data d1, [ c2 ], [ c3 ], < h2> and the training data d1, [ c2 ], [ c3 ], [ c4 ], < h2> comprise the index sample text c3, wherein one index sample text c3 points to the business analysis result label h2, and one index sample text c3 points to the index sample text c4, so that the number of the index sample texts pointed by the index sample text c3 is 1, and the number of the business analysis result labels pointed by the index sample text c3 is 1.
Step S1042, determining the number of target index sample texts pointed by the target object sample text as a second number according to the directed decision path.
Specifically, after the number of target sample texts pointed by the target object sample text is determined according to the directional decision path 50c or the training data 50b, the analysis server 50a determines the number of target sample texts pointed by the target object sample text as the second number. Obviously, for the training data d1, [ c2 ], [ c3 ], < h2>, the target index sample text includes the index sample text c2, and it is noted that, since the target index sample text here does not include the index sample text c3 because the target index sample text c3 to which the object sample text d1 is not directed is included in the training data d1, [ c2 ], [ c3 ], < h2 >.
As can be seen from the training data 50b or the directional decision path 50c, the 3 training data each include the object sample text d1 and point to the pointer sample text c1, the pointer sample text c2, and the pointer sample text c2, respectively, and since the pointer sample text c1 is not the target pointer sample text, the second number is 2.
For the training data d1, [ c2 ], [ c3 ], < h2>, the target index sample text may include the index sample text c3 when the number of target index sample texts pointed to by the index sample text c2 is calculated, and obviously, in the data training 50b, the number of target index sample texts pointed to by the index sample text c2 is 2. For the training data d1, [ c2 ], [ c3 ], < h2>, when the number of target business analysis result labels pointed to by the index sample text c3 is calculated, the target business analysis result labels may include the business analysis result labels h2, and obviously, in the data training 50b, the number of target business analysis result labels pointed to by the index sample text c3 is 1.
Step S1043, determining a first probability that the first training network node points to the second training network node according to the first number and the second number.
Specifically, in the embodiment of the present application, the initial analysis graph network 50d is trained by calculating the probability of each conditional edge in the initial analysis graph network 50d, and the formula for calculating the probability is as shown in the following formula (15), formula (16), and formula (17).
Figure 185216DEST_PATH_IMAGE011
(15)
Figure 865596DEST_PATH_IMAGE012
(16)
Figure 808145DEST_PATH_IMAGE013
(17)
Wherein, count (C)g,Dk) Representing object sample text dkPointed to pointer sample text cgOf network nodes D in the network object layerkNetwork node C in the network index layer to which it pointsgIs equivalent to the second number above;
Figure 511658DEST_PATH_IMAGE014
representing object sample text dkThe pointed-to pointer sample text (note that, here, not only the pointer sample text c is referred togBut all index sample text it points to), i.e. the network node D in the network object layerkIn the network index layer to which the point is directedThe number of network nodes is equal to the first number above. count (C)g’,Cg) Representing a network node C in a network index layergNetwork node C in the network index layer to which it pointsg’The number of (2);
Figure 135538DEST_PATH_IMAGE015
representing a network node C in a network index layergThe number of network nodes in the network index layer to which this is directed (note that this is not merely a network node C in the network index layerg’But the network nodes in all network index layers to which it points). count (H)s’,Cg) Representing a network node C in a network index layergNetwork node H in the pointed-to network result layersThe number of (2);
Figure 608108DEST_PATH_IMAGE016
representing a network node C in a network index layergThe number of network nodes in the network result layer to which this is directed (note that this is not merely a network node H in the network result layersBut the network nodes in all network result layers to which it points).
Combining the above steps S1041 to S1042 and the training data 50b in fig. 5, P (C2| D1) =2/3, P (C3| C2) =1, and P (H2| C3) =1 can be obtained. Referring back to fig. 5, the intermediate analysis graph network 50e may be generated by training the initial analysis graph network 50D according to the training data D1, [ C2 ], [ C3 ], < H2>, and obviously, the directional condition edge between the network node D1 and the network node C2 in the intermediate analysis graph network 50e has a directional arrow, the directional condition edge between the network node C2 and the network node C3 has a directional arrow, and the directional condition edge between the network node C3 and the network node H2 has a directional arrow.
It can be understood that, in the embodiment of the present application, training data d1, [ c2 ], [ c3 ], < h2> are taken as an example for explanation, and the calculation of the directional conditional probability between two training network nodes (i.e., training the directional conditional edge) according to the remaining training data may refer to the description of the embodiment corresponding to step S1041 to step S1043, and therefore, the description is not repeated. The initial analysis graph network 50D is trained according to the training data 50b, and a trained kernel-preserving graph network (i.e., a standard analysis graph network) is obtained, wherein the standard analysis graph network is a directed graph, for example, D1 — > C2 — > C3 — > H2 in the above graph example, and the directed conditional edges in the standard analysis graph network all carry the directed conditional probabilities.
The present application designs a set of experiments to verify the accuracy of the proposed underwriting system. For the experiment, 1000 pieces of single-core security data (namely training data including second user attribute sample texts and corresponding business analysis result labels) form a training set, and 500 pieces of single-core security data form a test set. The disease categories covered in the training data were 100. In the experimental process, the accuracy of the underwriting system reaches 99.6%, and in addition, the reasoning link of 20 underwriting conclusions (in 500 pieces of test data) of manual spot check reaches 100% of accuracy. Therefore, the underwriting system can effectively predict underwriting conclusions and give underwriting explanations.
In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
Further, please refer to fig. 7, and fig. 7 is a schematic flowchart of a data processing method according to an embodiment of the present application. The data processing method may be executed by the user terminal, the enterprise service, or the analysis server described in fig. 1, or may be executed by both the user terminal, the analysis server, and the enterprise server. As shown in fig. 7, the data processing procedure includes the following steps:
step S201, acquiring a user attribute text, and acquiring a standby standard user attribute entity word for representing the user attribute text.
Specifically, a user attribute text is input into a text recognition model, and standby original user attribute entity words used for representing the user attribute text are obtained based on the text recognition model; and inputting the standby original user attribute entity words into the entity word standardization model, and carrying out standardization processing on the standby original user attribute entity words based on the entity word standardization model to obtain standby standard user attribute entity words.
The text recognition model comprises an input layer, a coding layer, a hiding layer and a recognition layer; the specific process of obtaining the standby original user attribute entity words for characterizing the user attribute text based on the text recognition model may include: segmenting the user attribute text based on the input layer to obtain at least two word segments; inputting at least two participles into a coding layer, and respectively coding the at least two participles based on the coding layer to obtain at least two semantic vectors; inputting at least two semantic vectors into a hidden layer, and respectively carrying out hidden feature extraction processing on the at least two semantic vectors based on the hidden layer to obtain at least two hidden vectors; and inputting the at least two hidden vectors into a recognition layer, and recognizing the at least two hidden vectors based on the recognition layer to obtain standby original user attribute entity words for representing the user attribute text.
The specific process of obtaining the standby standard user attribute entity words by performing standardization processing on the standby original user attribute entity words based on the entity word standardization model may include: acquiring a standard sample entity word; determining an editing distance between a standard sample entity word and a standby original user attribute entity word based on an entity word standardization model; and acquiring the minimum editing distance from the editing distance, and determining the standard sample entity word corresponding to the minimum editing distance as a standby standard user attribute entity word of the standby original user attribute entity word.
The specific implementation process of step S201 can refer to step S101 in the embodiment corresponding to fig. 2, and a description thereof is omitted here.
Step S202, acquiring a standard analysis graph network; the standard analysis graph network comprises network nodes and directional conditional probabilities among the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise standby standard user attribute entity words; and the standard fruiting body word is used for representing the business analysis result.
Specifically, the standard user attribute entity words include standard object entity words and standard index entity words; the standard analysis graph network comprises a network object layer, a network index layer and a network result layer, wherein the network object layer is generated by network nodes corresponding to standard object entity words, the network index layer is generated by network nodes corresponding to standard index entity words, and the network result layer is generated by network nodes corresponding to standard result entity words.
The standard analysis Graph Network described in the present application is an application of Graph Network (GN), wherein Graph Network is a function set organized according to Graph structure in a topological space (topological space) to perform relational reasoning (relational reasoning). in deep learning theory, Graph Neural Network (GNN) and a generalization of Probabilistic Graphical Model (PGM) are included.
As shown in fig. 5, the standard analysis graph network in the present application is divided into three layers, which are a network object layer, a network index layer and a network result layer, wherein the network object layer is composed of medical entity words associated with diseases in the first user attribute sample text, the network index layer is composed of medical entity words associated with symptoms, indicators and the like in the first user attribute sample text, and the network result layer is composed of entity words associated with business analysis results.
The disease may be a disease category which is heavily examined by insurance companies in the process of insurance underwriting, such as thyroid nodules and breast nodules which are frequently examined in severe diseases. The index may refer to information of the patient (e.g. gender, age) or a specific attribute of the disease (e.g. nodule size), and it is noted that in the embodiment of the present application, all possible indexes are converted into network nodes, e.g. gender male and gender female are two separate network nodes, and age less than 40 and equal to or greater than 40 are also two separate network nodes. The business analysis result can be all possible underwriting conclusions predefined by the insurance company, such as underwriting, charging, exclusion, refusal, and the like.
Step S203, determining a service analysis reference result of the user attribute text according to the standby standard user attribute entity words and the standard analysis graph network.
Specifically, the standby standard user attribute entity words comprise standby standard object entity words and standby standard index entity words; and acquiring standard result entity words in the standard analysis graph network.
Constructing N standby directed decision paths aiming at the user attribute text according to the standby standard object entity words, the standby standard index entity words and the standard result entity words; wherein, a standby directed decision path comprises a standby standard object entity word, at least one standby standard index entity word and a standard result entity word; n is a positive integer; respectively obtaining standby path probabilities of the N standby directed decision paths according to the directed conditional probabilities; determining the maximum standby path probability in the standby path probabilities as a target path probability, and determining a standby directed decision path corresponding to the target path probability as a target directed decision path; and determining the standard result entity words in the target directed decision path as target standard result entity words, and determining a business analysis reference result according to the target standard result entity words.
Determining the standby standard object entity words in the target directed decision path as target standard object entity words, and determining the standby standard index entity words in the target directed decision path as target standard index entity words; and outputting a business analysis reference text of the user attribute text according to the pointing sequence, the target standard object entity words, the target standard index entity words and the target standard result entity words in the target directed decision path.
Referring to fig. 8, fig. 8 is a schematic view of a data processing scenario according to an embodiment of the present disclosure. As shown in FIG. 8, the enterprise server 80a obtains user attribute text 80c sent by the user terminal 80b, wherein the user attribute text 80c is used to indicate the basic information of the applicant applying for insurance application, and the user attribute text may include electronic physical examination notes, electronic health advice notes, and other text that can prove the physical condition of the applicant. After obtaining the user attribute text 80c, the enterprise server 80a inputs the user attribute text into the trained medical text structuring module 80d to obtain a standby standard user attribute entity word 80e that can represent the user attribute text 80c, as shown in fig. 8, the standby standard user attribute entity word 80e may include a standard object entity word d10, a standard index entity word c10, and a standard index entity word c 30.
The enterprise server 80a obtains the standard result body words 80f associated with the business analysis results from a local or analysis server, assuming that the standard result body words 80f include the standard result body words h10 and the standard result body words h 20. As shown in fig. 8, the enterprise server 80a constructs an alternative directed decision path for the user attribute text 80c according to the standard object entity word d10, the standard index entity word c10, the standard resultant entity word c30, the standard resultant entity word h10 and the standard resultant entity word h20, the present application summarizes the alternative standard user attribute entity word 80e and the standard resultant entity word 80f to exhaust all possible directed decision paths, as shown in fig. 8, an alternative directed decision path d10 — > c10 — > h10, an alternative directed decision path d10 — > c10 — > h20, an alternative directed decision path d10 — > c30 — > h 30, an alternative directed decision path d 30 — > c30 — > h 30 — > c — > 30 — > c30 — > 30 h 30, an alternative directed decision path d 30 — > c30 — > 30 h 30-30 c — >,3672 c30 h 30 c-30 c-, the backup directed decision path d 10- > c 30- > c 10- > h 20.
Referring to fig. 8 again, the canonical analysis graph network 80H includes a conditional side and a conditional probability corresponding to the conditional side, such as a conditional side (D, C) indicating that the network node D points to the network node C, the conditional probability P (C | D) of the conditional side (D, C) is 0.2, the conditional probability P (C | D) of the conditional side (D, C) is 0.9, the conditional probability P (C | C) of the conditional side (C, C) is 0, the conditional probability P (H | C) of the conditional side (C, H) is 0.35, the conditional probability P (H | C) of the conditional side (C, H) is 0.8, the conditional probability P (H | C) of the conditional side (C, H) is 1, the conditional probability P (H2| C3) of the conditional side (C3, H2) is 0.1. It is to be understood that, in fig. 8, for clarity, the illustrated standard analysis graph network 80h does not show all of the directional conditional edges and all of the directional conditional probabilities, but in actual application, the standard analysis graph network 80h includes all of the directional conditional edges and all of the directional conditional probabilities.
Assume that a standard object entity word corresponding to a network node D1 in the standard analysis graph network 80H is a standard object entity word D10, a standard index entity word corresponding to a network node C1 is a standard index entity word C10, a standard index entity word corresponding to a network node C3 is a standard index entity word C30, a standard result entity word corresponding to a network node H1 is a standard result entity word H10, and a standard result entity word corresponding to a network node H2 is a standard result entity word H20.
According to the directional conditional probabilities in the standard analysis graph network 80h, the enterprise server 80a may obtain the backup path probabilities of the 8 backup directional decision paths, respectively, and in this embodiment, the backup path probabilities are calculated based on bayesian theorem, where the 8 backup path probabilities are shown in table 2, and table 2 is a schematic table of the backup path probabilities provided in this embodiment.
TABLE 2
Figure DEST_PATH_IMAGE017
Referring to fig. 8 again, the maximum backup path probability (i.e., P3) of the backup path probabilities 80i shown in fig. 8 (i.e., the 8 backup path probabilities shown in table 2 above) is determined as the target path probability, the backup directed decision path corresponding to the target path probability is determined as the target directed decision path 80j, and it is known that the backup directed decision path d10 — > c30 — > h10 is the target directed decision path 80 j. The enterprise server 80a may determine the standard result somatic word (i.e., the standard result somatic word h 10) in the target directed decision path 80j as the target standard result somatic word, and then determine the business analysis reference result according to the target standard result somatic word.
Further, the enterprise server 80a determines the standby standard object entity word (i.e., the standard object entity word d 10) in the target directional decision path 80j as the target standard object entity word, and determines the standby standard index entity word (i.e., the standard index entity word c 30) in the target directional decision path 80j as the target standard index entity word. The enterprise server 80a may output the business analysis reference text d10, [ c30 ], < h10 ] of the user attribute text 80c to the user terminal 80b according to the pointing order in the target directional decision path 80j, the standard object entity word d10, the standard index entity word c30, and the standard result entity word h 10. After the user terminal 80b obtains the service analysis reference texts d10, [ c30 ], < h10>, the applicant can accurately obtain the underwriting result (i.e. the standard fruiting body word h 10), and can obtain the reason that the underwriting conclusion is the standard fruiting body word h 10.
According to the method, the interpretability capability of the graph network structure is fully utilized, when the model is output in a prediction mode, all network nodes are connected in series to form a logic link (namely a directed decision path) and output, and important reference is provided for artificial underwriting of an enterprise, so that the interpretability of the intelligent underwriting system is greatly improved, and the underwriting conclusion is more credible.
In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
The application provides a novel underwriting system of an end-to-end graph deduction type from user attribute texts (namely electronic examination reports and the like) to service analysis reference results (namely underwriting conclusions). The underwriting system adopts the latest AI model, and medical text structuring, underwriting inference map construction and underwriting network prediction inference are respectively carried out to finally obtain underwriting conclusions. With reference to the embodiments corresponding to fig. 2, fig. 6, and fig. 7, please refer to fig. 9, and fig. 9 is a flowchart illustrating a data processing method according to an embodiment of the present application. The data processing method may be executed by the user terminal, the analysis server, or the enterprise server described in fig. 1, or may be executed by both the user terminal, the analysis server, and the enterprise server. As shown in fig. 9, the data processing procedure includes the steps of:
step S901, electronic examination report/electronic health advice.
Specifically, the analysis server obtains an electronic examination report/an electronic health advice sent by the enterprise server or the user terminal. It can be understood that, if the user attribute text acquired by the analysis party is in a paper form, the electronic user attribute text corresponding to the paper user attribute text can be acquired through the scanning function of the analysis server.
Step S902, a medical text structuring module.
Specifically, the medical text structuring module solves the problem of poor wide applicability in the actual underwriting scene, and obtains standard medical entity words (namely standard user attribute entity words) from semi-structured or even unstructured user attribute texts through a structuring algorithm and a standardization algorithm, wherein spoken expressions or differential expressions can be mapped to the same standard expression, specifically, the structuring unifies and standardizes similar expressions of different types of diseases, symptoms, medicines, operations, inspection, examination, parts, treatment and the like, so that the uniqueness of disease and symptom judgment and conclusion in the atlas is ensured, and the underwriting model based on atlas reasoning can have a wider adaptation scene and a wider expansibility. The corresponding specific process may refer to the description of step S101 in the embodiment corresponding to fig. 2, which is not described herein again.
And step S903, an insurance network construction module.
Specifically, the problem of poor expandability in an actual underwriting scene is solved by the underwriting graph network construction module, the module makes full use of medical entity words mentioned in the first user attribute sample text and constructs an initial analysis graph network by using the medical entity words, and the graph network comprises three layers (a network object layer, a network index layer and a network result layer). According to the training data (namely the second user attribute sample text and the corresponding business analysis result label), determining the directional probability value of the transfer between the network nodes in the initial analysis graph network by using Bayesian theorem, wherein the process is that the initial analysis graph network is automatically trained from the training data to obtain the standard analysis graph network without manually adding judgment rules.
Step S904, the network prediction module of the underwriting graph.
And step S905, checking and guaranteeing a conclusion.
Specifically, the underwriting graph network prediction module solves the problem of poor interpretability in an actual underwriting scene, utilizes the standard analysis graph network trained in the last step to perform graph inference based on the bayesian theorem on the newly input user attribute text, finally obtains an underwriting conclusion of the user attribute text, and outputs an inferred path (namely a target directed decision path) as an explanation of the underwriting conclusion.
The application provides a novel intelligent underwriting solution in an end-to-end mode from an electronic report (namely user attribute text) to underwriting conclusion (business analysis reference result). And combining an advanced artificial intelligence model (NER model and the like), automatically generating a disease text and a judgment rule text, and carrying out reasoning on a standard analysis graph network to obtain an underwriting conclusion and interpretable logic. In addition, the graph construction mode and the graph deduction type underwriting method can well utilize the existing training data and automatically construct judgment modes of different diseases, heavy specific disease rule development work of developers is greatly reduced, and end-to-end connection from an electronic report to an underwriting conclusion is realized. For the subsequent electronic report, the underwriting conclusion can be directly given without manual intervention. Different from the existing underwriting model, the scheme utilizes the standard analysis graph network to deduce the underwriting conclusion, can output an electronic report to the decision path of the underwriting path while giving the underwriting conclusion, and quickly feeds back the logical judgment mode of the decision making system, so that an underwriting operator can quickly perform secondary underwriting on the underwriting conclusion.
Further, please refer to fig. 10, where fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (including program code) running on a computer device, for example, an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 10, the data processing apparatus 1 may include: a first acquisition module 11, a second acquisition module 12, a third acquisition module 13 and a probability determination module 14.
The first obtaining module 11 is configured to obtain a first user attribute sample text, and obtain a standard user attribute entity word used for representing the first user attribute sample text;
a second obtaining module 12, configured to obtain a standard result entity word used for representing a service analysis result, use the standard user attribute entity word and the standard result entity word as a network node, and construct an initial analysis graph network according to the network node;
a third obtaining module 13, configured to obtain a second user attribute sample text and a service analysis result tag corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word;
a probability determining module 14, configured to determine directional conditional probabilities between network nodes in the initial analysis graph network according to an association relationship between the second user attribute sample text and the service analysis result tag, to obtain a standard analysis graph network including the directional conditional probabilities; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
For specific functional implementation manners of the first obtaining module 11, the second obtaining module 12, the third obtaining module 13, and the probability determining module 14, reference may be made to steps S101 to S104 in the corresponding embodiment of fig. 2, which is not described herein again.
Referring again to fig. 10, the initial analysis graph network includes directed conditional edges between network nodes;
determining the probability module 14 may include: a first acquisition unit 141, a second acquisition unit 142, and a first determination unit 143.
The first obtaining unit 141 is configured to obtain a directed decision path including the second user attribute sample text and the business analysis result tag according to an association relationship between the second user attribute sample text and the business analysis result tag;
a second obtaining unit 142, configured to obtain a directed conditional edge indicated by the directed decision path in the initial analysis graph network, as a training directed conditional edge;
the first determining unit 143 is configured to determine a directional conditional probability corresponding to a training directional conditional edge according to the directional decision path.
For specific functional implementation manners of the first obtaining unit 141, the second obtaining unit 142, and the first determining unit 143, reference may be made to step S104 in the corresponding embodiment of fig. 2, which is not described herein again.
Referring to fig. 10 again, the standard user attribute entity words include standard object entity words and standard index entity words; the second user attribute sample text comprises an object sample text belonging to the standard object entity words and an index sample text belonging to the standard index entity words;
the second obtaining unit 142 may include: a first determining subunit 1421 and a second determining subunit 1422.
A first determining subunit 1421, configured to determine, in the initial analysis graph network, a network node corresponding to the target sample text as a first training network node, determine a network node corresponding to the index sample text as a second training network node, and determine a network node corresponding to the service analysis result label as a third training network node;
the second determining subunit 1422 is configured to determine, according to the directed decision path, a training directed condition edge among the directed condition edges among the first training network node, the second training network node, and the third training network node.
For specific functional implementation manners of the first determining subunit 1421 and the second determining subunit 1422, refer to step S104 in the embodiment corresponding to fig. 2, which is not described herein again.
Referring again to fig. 10, the training directed conditional edges include a first training directed conditional edge;
the first determination unit 143 may include: a first generation subunit 1431, and a third determination subunit 1432.
A first generating subunit 1431, configured to generate a first probability that the first training network node points to the second training network node according to an association relationship between the object sample text and the index sample text in the directional decision path;
a third determining subunit 1432, configured to determine the first probability as a directional conditional probability corresponding to the first training directional conditional edge; the first training directed conditional edge refers to a directed conditional edge pointing from the first training network node to the second training network node.
The specific functional implementation manners of the first generating subunit 1431 and the third determining subunit 1432 may refer to step S104 in the embodiment corresponding to fig. 2, which is not described herein again.
Referring again to fig. 10, the training directed conditional edges include a second training directed conditional edge;
the first determination unit 143 may include: a second generation subunit 1433, and a fourth determination subunit 1434.
A second generating subunit 1433, configured to generate a second probability that the second training network node points to the third training network node according to an association relationship between the index sample text in the directional decision path and the service analysis result label;
a fourth determining subunit 1434, configured to determine the second probability as a directional conditional probability corresponding to the second training directional conditional edge; and the second training directed conditional edge refers to a directed conditional edge pointed to the third training network node by the second training network node.
The specific functional implementation manners of the second generating subunit 1433 and the fourth determining subunit 1434 may refer to step S104 in the embodiment corresponding to fig. 2, which is not described herein again.
Referring again to fig. 10, the training directed conditional edges include a third training directed conditional edge; the directed decision path comprises at least two index sample texts;
the first determination unit 143 may include: a third generation subunit 1435 and a fifth determination subunit 1436.
A third generating subunit 1435, configured to generate a second probability between at least two second training network nodes according to an association relationship between at least two index sample texts in the directional decision path;
a fifth determining subunit 1436, configured to determine the second probability as a directional conditional probability corresponding to the third training directional conditional edge; the at least two second training network nodes comprise network nodes corresponding to the at least two index sample texts respectively; and the third training directed conditional edge is obtained by connecting at least two second training network nodes according to the direction sequence between at least two index sample texts contained in the directed decision path.
The specific functional implementation manners of the third generating subunit 1435 and the fifth determining subunit 1436 may refer to step S104 in the embodiment corresponding to fig. 2, which is not described herein again.
Referring to fig. 10 again, the number of the object sample texts is at least two, and the number of the index sample texts is at least two; the at least two object sample texts comprise target object sample texts, and the at least two index sample texts comprise target index sample texts;
a first generating subunit 1431, configured to determine, according to the directional decision path, the number of index sample texts pointed by the target object sample text as a first number;
the first generating subunit 1431 is further specifically configured to determine, according to the directional decision path, the number of target index sample texts pointed by the target object sample text as the second number;
the first generating subunit 1431 is further specifically configured to determine, according to the first number and the second number, a first probability that the first training network node points to the second training network node.
For a specific function implementation manner of the first generating subunit 1431, refer to steps S1041 to S1043 in the embodiment corresponding to fig. 6, which is not described herein again.
Referring to fig. 10 again, the standard user attribute entity words include standard object entity words and standard index entity words;
the second obtaining module 12 may include: a second determining unit 121, a first generating unit 122, a first connecting unit 123, a second connecting unit 124, a third determining unit 125, and a second generating unit 126.
A second determining unit 121, configured to determine the standard object entity words, the standard index entity words, and the standard result entity words as network nodes;
a first generating unit 122, configured to generate a network object layer according to the network node corresponding to the standard object entity word, generate a network index layer according to the network node corresponding to the standard index entity word, and generate a network result layer according to the network node corresponding to the standard result entity word;
the first connection unit 123 is configured to connect each network node in the network object layer with each network node in the network index layer, respectively, to obtain a first directed edge;
a second connection unit 124, configured to connect each network node in the network index layer with each network node in the network result layer, respectively, to obtain a second directed edge;
a third determining unit 125, configured to determine the first directed edge and the second directed edge as directed conditional edges;
and a second generating unit 126, configured to construct an initial analysis graph network according to the network nodes and the directed conditional edges.
For specific functional implementation manners of the second determining unit 121, the first generating unit 122, the first connecting unit 123, the second connecting unit 124, the third determining unit 125, and the second generating unit 126, reference may be made to step S102 in the corresponding embodiment of fig. 2, which is not described herein again.
Referring to fig. 10 again, the first obtaining module 11 may include: a first input unit 111 and a second input unit 112.
The first input unit 111 is configured to input the first user attribute sample text into a text recognition model, and obtain an original user attribute entity word used for representing the first user attribute sample text based on the text recognition model;
the second input unit 112 is configured to input the original user attribute entity words into the entity word standardization model, and perform standardization processing on the original user attribute entity words based on the entity word standardization model to obtain standard user attribute entity words.
For specific functional implementation of the first input unit 111 and the second input unit 112, reference may be made to step S101 in the corresponding embodiment of fig. 2, which is not described herein again.
Referring to fig. 10 again, the text recognition model includes an input layer, a coding layer, a hidden layer, and a recognition layer;
the first input unit 111 may include: a first processing sub-unit 1111, a second processing sub-unit 1112, a third processing sub-unit 1113 and a fourth processing sub-unit 1114.
The first processing subunit 1111 is configured to perform segmentation processing on the first user attribute sample text based on the input layer to obtain at least two word segments;
a second processing subunit 1112, configured to input the at least two participles into the coding layer, and perform coding processing on the at least two participles based on the coding layer, respectively, to obtain at least two semantic vectors;
a third processing subunit 1113, configured to input the at least two semantic vectors into the hidden layer, and perform hidden feature extraction processing on the at least two semantic vectors based on the hidden layer, respectively, to obtain at least two hidden vectors;
the fourth processing subunit 1114 is configured to input the at least two hidden vectors into the recognition layer, and perform recognition processing on the at least two hidden vectors based on the recognition layer to obtain an original user attribute entity word used for characterizing the first user attribute sample text.
For specific functional implementation manners of the first processing sub-unit 1111, the second processing sub-unit 1112, the third processing sub-unit 1113, and the fourth processing sub-unit 1114, reference may be made to step S101 in the corresponding embodiment of fig. 2, which is not described herein again.
Referring again to fig. 10, the second input unit 112 may include: a first acquisition sub-unit 1121, a sixth determination sub-unit 1122, and a second acquisition sub-unit 1123.
A first obtaining sub-unit 1121, configured to obtain a standard sample entity word;
a sixth determining subunit 1122, configured to determine, based on the entity word normalization model, an editing distance between the standard sample entity word and the original user attribute entity word;
the second obtaining subunit 1123 is configured to obtain a minimum edit distance from the edit distances, and determine the standard sample entity word corresponding to the minimum edit distance as the standard user attribute entity word of the original user attribute entity word.
For specific functional implementation manners of the first obtaining sub-unit 1121, the sixth determining sub-unit 1122, and the second obtaining sub-unit 1123, reference may be made to step S101 in the embodiment corresponding to fig. 2, which is not described herein again.
According to the embodiment of the application, the standard user attribute entity words used for representing the first user attribute sample text can be obtained by obtaining the first user attribute sample text; further, a service analysis result associated with the first user attribute sample text is obtained, then a standard result entity word used for representing the service analysis result is obtained, the standard user attribute entity word and the standard result entity word can be used as network nodes, and an initial analysis graph network can be constructed according to the network nodes; further, in order to obtain a standard analysis graph network based on the initial analysis graph network, a sample text for training the initial analysis graph network may be obtained, where the sample text includes a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word; further, the directional conditional probability between network nodes in the initial analysis graph network can be determined according to the incidence relation between the second user attribute sample text and the service analysis result label, and then the standard analysis graph network containing the directional conditional probability can be obtained; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text. In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
Further, please refer to fig. 11, where fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 11, the computer device 1000 may be the analysis server in the embodiment corresponding to fig. 2, and the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display) and a Keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally also be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 11, a memory 1005, which is a kind of computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 1000 shown in fig. 11, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:
acquiring a first user attribute sample text, and acquiring standard user attribute entity words for representing the first user attribute sample text;
acquiring standard result entity words used for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes;
acquiring a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word;
determining the directional conditional probability between network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directional conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to fig. 2, fig. 6, fig. 7, and fig. 9, and may also perform the description of the data processing apparatus 1 in the embodiment corresponding to fig. 10, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the data processing method provided in each step in fig. 2, fig. 6, fig. 7, and fig. 9 is implemented, which may specifically refer to the implementation manner provided in each step in fig. 2, fig. 6, fig. 7, and fig. 9, and is not described herein again.
The computer readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
An aspect of the application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can perform the description of the data processing method in the embodiments corresponding to fig. 2, fig. 6, fig. 7, and fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Further, please refer to fig. 12, where fig. 12 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing device 2 may be a computer program (comprising program code) running on a computer apparatus, for example, the data processing device 2 is an application software; the data processing device 2 may be configured to perform corresponding steps in the method provided by the embodiment of the present application. As shown in fig. 12, the data processing apparatus 2 may include: a first obtaining module 21, a second obtaining module 22 and a determination result module 23.
The first obtaining module 21 is configured to obtain a user attribute text, and obtain a standby standard user attribute entity word for representing the user attribute text;
a second obtaining module 22, configured to obtain a standard analysis graph network; the standard analysis graph network comprises network nodes and directional conditional probabilities among the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise standby standard user attribute entity words; the standard fruiting body word is used for representing a service analysis result;
and a result determining module 23, configured to determine a service analysis reference result of the user attribute text according to the standby standard user attribute entity word and the standard analysis graph network.
For specific functional implementation manners of the first obtaining module 21, the second obtaining module 22 and the determination result module 23, reference may be made to steps S201 to S203 in the embodiment corresponding to fig. 7, which is not described herein again.
Referring to fig. 12 again, the standby standard user attribute entity words include standby standard object entity words and standby standard index entity words;
the determination result module 23 may include: a first acquisition unit 231, a build path unit 232, a second acquisition unit 233, a first determination unit 234, and a second determination unit 235.
A first obtaining unit 231, configured to obtain a standard result entity word in the standard analysis graph network;
a constructing path unit 232, configured to construct N standby directed decision paths for the user attribute text according to the standby standard object entity words, the standby standard index entity words, and the standard result entity words; wherein, a standby directed decision path comprises a standby standard object entity word, at least one standby standard index entity word and a standard result entity word; n is a positive integer;
a second obtaining unit 233, configured to obtain standby path probabilities of the N standby directed decision paths, respectively, according to the directed conditional probabilities;
a first determining unit 234, configured to determine a maximum backup path probability in the backup path probabilities as a target path probability, and determine a backup directed decision path corresponding to the target path probability as a target directed decision path;
and a second determining unit 235, configured to determine the standard result body word in the target directional decision path as the target standard result body word, and determine a service analysis reference result according to the target standard result body word.
For specific functional implementation manners of the first obtaining unit 231, the path building unit 232, the second obtaining unit 233, the first determining unit 234, and the second determining unit 235, reference may be made to step S203 in the embodiment corresponding to fig. 7, which is not described herein again.
Referring again to fig. 12, the determination module 23 may further include: a third determination unit 236 and an output text unit 237.
A third determining unit 236, configured to determine the standby standard object entity word in the target directed decision path as the target standard object entity word, and determine the standby standard indicator entity word in the target directed decision path as the target standard indicator entity word;
and an output text unit 237, configured to output a service analysis reference text of the user attribute text according to the pointing order in the target directional decision path, the target standard object entity word, the target standard index entity word, and the target standard result entity word.
The specific functional implementation manners of the third determining unit 236 and the text outputting unit 237 may refer to step S203 in the embodiment corresponding to fig. 7, which is not described herein again.
According to the embodiment of the application, the standard user attribute entity words used for representing the first user attribute sample text can be obtained by obtaining the first user attribute sample text; further, a service analysis result associated with the first user attribute sample text is obtained, then a standard result entity word used for representing the service analysis result is obtained, the standard user attribute entity word and the standard result entity word can be used as network nodes, and an initial analysis graph network can be constructed according to the network nodes; further, in order to obtain a standard analysis graph network based on the initial analysis graph network, a sample text for training the initial analysis graph network may be obtained, where the sample text includes a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to a standard result entity word; further, the directional conditional probability between network nodes in the initial analysis graph network can be determined according to the incidence relation between the second user attribute sample text and the service analysis result label, and then the standard analysis graph network containing the directional conditional probability can be obtained; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text. In the embodiment of the application, the initial analysis graph network is constructed by acquiring the standard user attribute entity words for representing the first user attribute sample text and the standard result entity words for representing the service analysis result, so that differential expression can be eliminated, and wide applicability of different expressions is improved; in addition, the embodiment of the application can intelligently predict the service analysis reference result of the user attribute text through the directed conditional probability in the standard analysis graph network, thereby reducing the resource cost in service data analysis; furthermore, when the standard analysis graph network is constructed, the mapping relation between the analysis result and the analysis index does not need to be preset, so that the defect that the business experience of business personnel is excessively depended in the prior art can be overcome, and the accuracy of the business analysis result can be further ensured on business data analysis.
Further, please refer to fig. 13, where fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 13, the computer device 2000 may include: the processor 2001, the network interface 2004 and the memory 2005, the computer device 2000 may further include: a user interface 2003, and at least one communication bus 2002. The communication bus 2002 is used to implement connection communication between these components. The user interface 2003 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 2003 may further include a standard wired interface and a standard wireless interface. The network interface 2004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 2005 may be a high-speed RAM memory, or may be a non-volatile memory (e.g., at least one disk memory). The memory 2005 may optionally also be at least one memory device located remotely from the aforementioned processor 2001. As shown in fig. 13, the memory 2005 which is a kind of computer-readable storage medium may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 2000 shown in fig. 13, the network interface 2004 may provide a network communication function; and the user interface 2003 is primarily used to provide an interface for user input; and processor 2001 may be used to invoke the device control application stored in memory 2005 to implement:
acquiring a user attribute text, and acquiring a standby standard user attribute entity word for representing the user attribute text;
acquiring a standard analysis graph network; the standard analysis graph network comprises network nodes and directional conditional probabilities among the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise standby standard user attribute entity words; the standard fruiting body word is used for representing a service analysis result;
and determining a service analysis reference result of the user attribute text according to the standby standard user attribute entity words and the standard analysis graph network.
It should be understood that the computer device 2000 described in this embodiment may perform the description of the data processing method in the embodiment corresponding to fig. 2, fig. 6, fig. 7, and fig. 9, and may also perform the description of the data processing apparatus 2 in the embodiment corresponding to fig. 12, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the data processing method provided in each step in fig. 2, fig. 6, fig. 7, and fig. 9 is implemented, which may specifically refer to the implementation manner provided in each step in fig. 2, fig. 6, fig. 7, and fig. 9, and is not described herein again.
The computer readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
An aspect of the application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can perform the description of the data processing method in the embodiments corresponding to fig. 2, fig. 6, fig. 7, and fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (15)

1. A data processing method, comprising:
acquiring a first user attribute sample text, and acquiring standard user attribute entity words for representing the first user attribute sample text;
acquiring standard result entity words used for representing service analysis results, taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes;
acquiring a second user attribute sample text and a service analysis result label corresponding to the second user attribute sample text; the business analysis result label belongs to the standard result entity word;
determining directed conditional probability between the network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label to obtain a standard analysis graph network containing the directed conditional probability; the standard analysis graph network is used for predicting a service analysis reference result for the user attribute text.
2. The method of claim 1, wherein the initial analysis graph network comprises directed conditional edges between the network nodes;
determining a directional conditional probability between the network nodes in the initial analysis graph network according to the incidence relation between the second user attribute sample text and the service analysis result label, including:
acquiring a directed decision path containing the second user attribute sample text and the business analysis result label according to the incidence relation between the second user attribute sample text and the business analysis result label;
obtaining a directed conditional edge indicated by the directed decision path in the initial analysis graph network as a training directed conditional edge;
and determining the directional conditional probability corresponding to the training directional conditional edge according to the directional decision path.
3. The method of claim 2, wherein the standard user attribute entity words comprise standard object entity words and standard index entity words; the second user attribute sample text comprises an object sample text belonging to the standard object entity word and an index sample text belonging to the standard index entity word;
the obtaining, in the initial analysis graph network, a directed conditional edge indicated by the directed decision path as a training directed conditional edge includes:
in the initial analysis graph network, determining a network node corresponding to the object sample text as a first training network node, determining a network node corresponding to the index sample text as a second training network node, and determining a network node corresponding to the service analysis result label as a third training network node;
and determining the training directed condition edges in directed condition edges among the first training network node, the second training network node and the third training network node according to the directed decision path.
4. The method of claim 3, wherein the training directed conditional edge comprises a first training directed conditional edge;
determining a directional conditional probability corresponding to the training directional conditional edge according to the directional decision path includes:
generating a first probability that the first training network node points to the second training network node according to an incidence relation between the object sample text and the index sample text in the directed decision path;
determining the first probability as a directional conditional probability corresponding to the first training directional conditional edge; wherein the first training directed conditional edge is a directed conditional edge directed by the first training network node to the second training network node.
5. The method of claim 3, wherein the training conditional edge comprises a second training conditional edge;
determining a directional conditional probability corresponding to the training directional conditional edge according to the directional decision path includes:
generating a second probability that the second training network node points to the third training network node according to the incidence relation between the index sample text and the service analysis result label in the directed decision path;
determining the second probability as a directional conditional probability corresponding to the second training directional conditional edge; wherein the second training directed conditional edge is a directed conditional edge pointed to by the second training network node to the third training network node.
6. The method of claim 3, wherein the training directed conditional edge comprises a third training directed conditional edge; the directed decision path comprises at least two index sample texts;
determining a directional conditional probability corresponding to the training directional conditional edge according to the directional decision path includes:
generating a second probability between at least two second training network nodes according to the incidence relation between the at least two index sample texts in the directed decision path;
determining the second probability as a directional conditional probability corresponding to the third training directional conditional edge; the at least two second training network nodes comprise network nodes corresponding to the at least two index sample texts respectively; and the third training directed conditional edge is obtained by connecting the at least two second training network nodes according to the directional sequence between the at least two index sample texts included in the directed decision path.
7. The method according to claim 4, wherein the number of the object sample texts is at least two, and the number of the index sample texts is at least two; the at least two object sample texts comprise target object sample texts, and the at least two index sample texts comprise target index sample texts;
generating a first probability that the first training network node points to the second training network node according to the incidence relation between the object sample text and the index sample text in the directional decision path, including:
determining the number of index sample texts pointed by the target object sample texts as a first number according to the directed decision path;
determining the number of target index sample texts pointed by the target object sample texts as a second number according to the directed decision path;
determining the first probability that the first training network node points to the second training network node based on the first number and the second number.
8. The method of claim 1, wherein the standard user attribute entity words comprise standard object entity words and standard index entity words;
taking the standard user attribute entity words and the standard result entity words as network nodes, and constructing an initial analysis graph network according to the network nodes, wherein the method comprises the following steps:
determining the standard object entity words, the standard index entity words and the standard result entity words as the network nodes;
generating a network object layer according to the network nodes corresponding to the standard object entity words, generating a network index layer according to the network nodes corresponding to the standard index entity words, and generating a network result layer according to the network nodes corresponding to the standard result entity words;
connecting each network node in the network object layer with each network node in the network index layer respectively to obtain a first directed edge;
connecting each network node in the network index layer with each network node in the network result layer respectively to obtain a second directed edge;
determining the first directed edge and the second directed edge as directed conditional edges;
and constructing the initial analysis graph network according to the network nodes and the directed conditional edges.
9. The method of claim 1, wherein obtaining standard user attribute entity words for characterizing the first user attribute sample text comprises:
inputting the first user attribute sample text into a text recognition model, and acquiring original user attribute entity words for representing the first user attribute sample text based on the text recognition model;
and inputting the original user attribute entity words into an entity word standardization model, and carrying out standardization processing on the original user attribute entity words based on the entity word standardization model to obtain the standard user attribute entity words.
10. The method of claim 9, wherein the text recognition model comprises an input layer, a coding layer, a hidden layer, and a recognition layer;
the obtaining of original user attribute entity words for characterizing the first user attribute sample text based on the text recognition model includes:
segmenting the first user attribute sample text based on the input layer to obtain at least two word segments;
inputting the at least two participles into the coding layer, and respectively coding the at least two participles based on the coding layer to obtain at least two semantic vectors;
inputting the at least two semantic vectors into the hidden layer, and respectively performing hidden feature extraction processing on the at least two semantic vectors based on the hidden layer to obtain at least two hidden vectors;
and inputting the at least two hidden vectors into the recognition layer, and recognizing the at least two hidden vectors based on the recognition layer to obtain the original user attribute entity words for representing the first user attribute sample text.
11. The method of claim 9, wherein the normalizing the original user attribute entity words based on the entity word normalization model to obtain the standard user attribute entity words comprises:
acquiring a standard sample entity word;
determining an editing distance between the standard sample entity words and the original user attribute entity words based on the entity word standardized model;
and acquiring a minimum editing distance from the editing distance, and determining a standard sample entity word corresponding to the minimum editing distance as the standard user attribute entity word of the original user attribute entity word.
12. A data processing method, comprising:
acquiring a user attribute text, and acquiring a standby standard user attribute entity word for representing the user attribute text;
acquiring a standard analysis graph network; the standard analysis graph network comprises network nodes and directional conditional probabilities between the network nodes; the network nodes are composed of standard user attribute entity words and standard result entity words; the standard user attribute entity words comprise the standby standard user attribute entity words; the standard fruiting body word is used for representing a service analysis result;
and determining a service analysis reference result of the user attribute text according to the standby standard user attribute entity words and the standard analysis graph network.
13. The method of claim 12, wherein the alternative standard user attribute entity words comprise alternative standard object entity words and alternative standard indicator entity words;
the determining a service analysis reference result of the user attribute text according to the standby standard user attribute entity word and the standard analysis graph network comprises:
acquiring the standard result entity words in the standard analysis chart network;
constructing N standby directed decision paths aiming at the user attribute text according to the standby standard object entity words, the standby standard index entity words and the standard result entity words; wherein, a standby directed decision path comprises a standby standard object entity word, at least one standby standard index entity word and a standard result entity word; n is a positive integer;
respectively obtaining standby path probabilities of the N standby directed decision paths according to the directed conditional probabilities;
determining the maximum standby path probability in the standby path probabilities as a target path probability, and determining a standby directed decision path corresponding to the target path probability as a target directed decision path;
and determining the standard result entity words in the target directed decision path as target standard result entity words, and determining the service analysis reference result according to the target standard result entity words.
14. A computer device, comprising: a processor, a memory, and a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is configured to provide data communication functions, the memory is configured to store program code, and the processor is configured to call the program code to perform the steps of the method according to any one of claims 1 to 13.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 13.
CN202011352610.XA 2020-11-26 2020-11-26 Data processing method, data processing equipment and computer readable storage medium Active CN112182253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011352610.XA CN112182253B (en) 2020-11-26 2020-11-26 Data processing method, data processing equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011352610.XA CN112182253B (en) 2020-11-26 2020-11-26 Data processing method, data processing equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112182253A CN112182253A (en) 2021-01-05
CN112182253B true CN112182253B (en) 2021-02-26

Family

ID=73918115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011352610.XA Active CN112182253B (en) 2020-11-26 2020-11-26 Data processing method, data processing equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112182253B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392228B (en) * 2021-08-03 2023-07-21 广域铭岛数字科技有限公司 Anomaly prediction and tracing method, system, equipment and medium based on automobile production
CN118551755A (en) * 2023-02-27 2024-08-27 腾讯科技(深圳)有限公司 Data processing method, device and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649696A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Information classification method and device
CN109960728A (en) * 2019-03-11 2019-07-02 北京市科学技术情报研究所(北京市科学技术信息中心) A kind of open field conferencing information name entity recognition method and system
CN110880044A (en) * 2019-10-23 2020-03-13 广东电网有限责任公司 Markov chain-based load prediction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320228A1 (en) * 2010-06-24 2011-12-29 Bmc Software, Inc. Automated Generation of Markov Chains for Use in Information Technology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649696A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Information classification method and device
CN109960728A (en) * 2019-03-11 2019-07-02 北京市科学技术情报研究所(北京市科学技术信息中心) A kind of open field conferencing information name entity recognition method and system
CN110880044A (en) * 2019-10-23 2020-03-13 广东电网有限责任公司 Markov chain-based load prediction method

Also Published As

Publication number Publication date
CN112182253A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN109564589B (en) Entity identification and linking system and method using manual user feedback
CN109992664B (en) Dispute focus label classification method and device, computer equipment and storage medium
CN112015917A (en) Data processing method and device based on knowledge graph and computer equipment
CN112100406B (en) Data processing method, device, equipment and medium
CN111680159A (en) Data processing method and device and electronic equipment
CA2973138A1 (en) Systems, devices, and methods for automatic detection of feelings in text
CN111914562B (en) Electronic information analysis method, device, equipment and readable storage medium
WO2022068160A1 (en) Artificial intelligence-based critical illness inquiry data identification method and apparatus, device, and medium
CN112182253B (en) Data processing method, data processing equipment and computer readable storage medium
US20160063636A1 (en) Predictive insurance transaction error system
CN112035595A (en) Construction method and device of audit rule engine in medical field and computer equipment
WO2023088278A1 (en) Method and apparatus for verifying authenticity of expression, and device and medium
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN115359799A (en) Speech recognition method, training method, device, electronic equipment and storage medium
CN112463989A (en) Knowledge graph-based information acquisition method and system
CN116719520A (en) Code generation method and device
CN116821373A (en) Map-based prompt recommendation method, device, equipment and medium
CN114298314A (en) Multi-granularity causal relationship reasoning method based on electronic medical record
CN117473057A (en) Question-answering processing method, system, equipment and storage medium
CN110287270B (en) Entity relationship mining method and equipment
CN114117082B (en) Method, apparatus, and medium for correcting data to be corrected
CN116431827A (en) Information processing method, information processing device, storage medium and computer equipment
CN116956934A (en) Task processing method, device, equipment and storage medium
CN116719920A (en) Dynamic sampling dialogue generation model training method, device, equipment and medium
CN115620886A (en) Data auditing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant