CN111143505B - Document processing method, device, medium and electronic equipment - Google Patents

Document processing method, device, medium and electronic equipment Download PDF

Info

Publication number
CN111143505B
CN111143505B CN201911192868.5A CN201911192868A CN111143505B CN 111143505 B CN111143505 B CN 111143505B CN 201911192868 A CN201911192868 A CN 201911192868A CN 111143505 B CN111143505 B CN 111143505B
Authority
CN
China
Prior art keywords
product
document
clause
content
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911192868.5A
Other languages
Chinese (zh)
Other versions
CN111143505A (en
Inventor
赵丽
赵文鹏
李永峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Life Insurance Co ltd
Taikang Insurance Group Co Ltd
Original Assignee
Taikang Life Insurance Co ltd
Taikang Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Life Insurance Co ltd, Taikang Insurance Group Co Ltd filed Critical Taikang Life Insurance Co ltd
Priority to CN201911192868.5A priority Critical patent/CN111143505B/en
Publication of CN111143505A publication Critical patent/CN111143505A/en
Application granted granted Critical
Publication of CN111143505B publication Critical patent/CN111143505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The invention provides a document processing method, which comprises the following steps: analyzing a product development document to obtain the content of the product development document, and generating a product risk corresponding relation table based on the content of the product development document; analyzing a product clause document to obtain the content of the product clause document, and generating a text document based on the content of the product clause document; extracting field information from the text document, and generating a product clause information table based on the field information; and matching and combining the product risk corresponding relation table and the product clause information table to obtain a risk responsibility configuration table, inquiring and processing policy information data of the client through a structured inquiry language based on the risk responsibility configuration table, and displaying the policy information data to the user in a visual interface form through a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee.

Description

Document processing method, device, medium and electronic equipment
Technical Field
The present invention relates to the technical field of document processing, and in particular, to a document processing method, a device, a medium, and an electronic apparatus.
Background
It is well known that the insurance industry differs from the traditional industry mainly in that the insurance itself is a service, the so-called insurance product is in fact a paper contract, in which the service commitments that the insurer needs to fulfill for the customer are recorded, before sale we call the terms of the insurance product, and after sale the insurance contract. For clients, when there are multiple insurance combinations with complex responsibilities in the clients, it is difficult to easily clear the security rights currently owned by the clients. It is also difficult for insurers to discern which customers in the existing customer base are not fully secured, which is not conducive to secondary development of customers. For example, at the beginning of new product and core system development, only basic product information such as risk codes, risk names, risk categories, etc. will often be stored in the system due to the complexity of the product terms themselves and the principle of business priority over the system. If the responsibility of each product is clearly combed and matched with the dangerous code of the core system, the product management document can only be read manually and read through the product management document when the system is developed. Considering that this approach consumes more manpower and material resources and has a higher error rate.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the invention aims to provide a document processing method, a document processing device, a medium and electronic equipment. The method can match the contents of the product development document and the product clause document by analyzing the contents of the product development document and the product clause document, so that the work efficiency is improved, the error rate is reduced to a certain extent, and the policy information data of the client can be queried and processed through a structured query language based on the dangerous responsibility configuration table and is displayed to the user in a visual interface mode through a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee. For another example, when the client is a user, the visual interface can enable the client to clearly know all the current guarantee conditions, and the experience of the client on insurance service is improved.
Other features and advantages of the invention will be apparent from the following detailed description, or may be learned by the practice of the invention.
According to a first aspect of an embodiment of the present invention, there is provided a document processing method including: analyzing a product development document to obtain the content of the product development document, and generating a product risk corresponding relation table based on the content of the product development document; analyzing a product clause document to obtain the content of the product clause document, and generating a text document based on the content of the product clause document; extracting field information from the text document, and generating a product clause information table based on the field information; and matching and combining the product risk corresponding relation table and the product clause information table to obtain a risk responsibility configuration table.
In some embodiments of the invention, parsing the product development document to obtain content of the product development document includes: and reading, disassembling and/or assigning the product development document through the VBA to obtain the content of the product development document.
In some embodiments of the present invention, the product term document is in a PDF format, and parsing the product term document to obtain the content of the product term document includes: and analyzing the PDF format product clause document by using the Python to obtain the content of the product clause document.
In some embodiments of the invention, extracting field information from the text document includes: processing the text document through regular expression rules to obtain product clause names; extracting field information from the text document according to the product clause name.
In some embodiments of the invention, extracting field information from the text document according to the product clause name comprises: classifying the risk class to which the product clause belongs in the text document according to the product clause name; determining the field names required to be extracted by the product clauses according to the risk class to which the product clauses belong; extracting field information from the text document according to the field names required to be extracted by the product clauses.
In some embodiments of the invention, extracting field information from the text document according to a field name required to be extracted by the product clause includes: extracting field information from the text document using regular expression rules, doc2vec, and/or location information of text based on field names required to be extracted by the product clauses.
In some embodiments of the present invention, matching and combining the product risk correspondence table and the product clause information table to obtain the risk responsibility configuration table includes: and matching and combining the content of the product dangerous seed corresponding relation table and the content of the product clause information table according to the similarity of the dangerous seed in the product dangerous seed corresponding relation table and the dangerous class in the product clause information table to obtain the dangerous seed responsibility configuration table.
According to a second aspect of an embodiment of the present invention, there is provided a document processing apparatus including: the first analysis module is used for analyzing the product development document to obtain the content of the product development document and generating a product risk corresponding relation table based on the content of the product development document; the second analysis module is used for analyzing the product clause document to obtain the content of the product clause document and generating a text document based on the content of the product clause document; the extraction module is used for extracting field information from the text document and generating a product clause information table based on the field information; and the matching and combining module is used for matching and combining the product risk corresponding relation table and the product clause information table to obtain a risk responsibility configuration table.
In some embodiments of the present invention, the first parsing module is configured to: and reading, disassembling and/or assigning the product development document through the VBA to obtain the content of the product development document.
In some embodiments of the present invention, the product clause document is in a PDF format, and the second parsing module is configured to: and analyzing the PDF format product clause document by using the Python to obtain the content of the product clause document.
In some embodiments of the present invention, the extracting module includes: the acquisition module is used for processing the text document through regular expression rules so as to acquire product clause names; and the first extraction module is used for extracting field information from the text document according to the product clause name.
In some embodiments of the present invention, the first extraction module includes: the classification module is used for classifying the risk class to which the product clause belongs in the text document according to the product clause name; the determining module is used for determining field names required to be extracted by the product clauses according to the risk class to which the product clauses belong; and the sub-module of the first extraction module is used for extracting field information from the text document according to the field names required to be extracted by the product clauses.
In some embodiments of the present invention, the submodule of the first extraction module is configured to: extracting field information from the text document using regular expression rules, doc2vec, and/or location information of text based on field names required to be extracted by the product clauses.
In some embodiments of the present invention, the matching merge module is configured to: and matching and combining the content of the product dangerous seed corresponding relation table and the content of the product clause information table according to the similarity of the dangerous seed in the product dangerous seed corresponding relation table and the dangerous class in the product clause information table to obtain the dangerous seed responsibility configuration table.
According to a third aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the document processing method as described in the first aspect of the embodiments above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements the document processing method according to the first aspect of the above embodiments.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
in the technical scheme provided by the embodiments of the invention, the contents of the product development document and the product clause document are matched by analyzing the two contents, so that the error rate is reduced to a certain extent, and the policy information data of the client can be queried and processed through a structured query language based on the dangerous seed responsibility configuration table and is displayed to the user in a visual interface form through a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee. For another example, when the client is a user, the visual interface can enable the client to clearly know all the current guarantee conditions, and the experience of the client on insurance service is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 shows a schematic diagram of an exemplary system architecture of a document processing method or document processing apparatus to which embodiments of the present invention may be applied;
FIG. 2 schematically illustrates a flow chart of a document processing method according to an embodiment of the invention;
FIG. 3 schematically illustrates a flow chart of a document processing method according to another embodiment of the invention;
FIG. 4 schematically illustrates a flow chart of a document processing method according to another embodiment of the invention;
FIG. 5 schematically illustrates a schematic diagram of text location information according to another embodiment of the invention;
FIG. 6 schematically shows a block diagram of a document processing apparatus according to an embodiment of the present invention;
FIG. 7 schematically shows a block diagram of a document processing apparatus according to another embodiment of the present invention;
FIG. 8 schematically shows a block diagram of a document processing apparatus according to another embodiment of the present invention;
fig. 9 schematically shows a schematic of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
FIG. 1 shows a schematic diagram of an exemplary system architecture to which a document processing method or document processing apparatus of an embodiment of the present invention may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices with display screens including, but not limited to, smartphones, tablet computers, portable computers, desktop computers, and the like.
The server 105 may be a server providing various services. For example, the user uploads the product development document and the product clause document to the server 105 by using the terminal device 103 (may also be the terminal device 101 or 102), the server 105 can match the contents of the product development document and the product clause document by analyzing the contents of the product development document and the product clause document, so that the work efficiency is improved, the error rate is reduced to a certain extent, and the policy information data of the client can be queried and processed by a structured query language based on the dangerous responsibility configuration table and is displayed to the user in a visual interface form by a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee. For another example, when the client is a user, the visual interface can enable the client to clearly know all the current guarantee conditions, and the experience of the client on insurance service is improved.
In some embodiments, the document processing method provided by the embodiments of the present invention is generally performed by the server 105, and accordingly, the document processing apparatus is generally disposed in the server 105. In other embodiments, some terminals may have similar functions as servers to perform the method. Therefore, the document processing method provided by the embodiment of the invention is not limited to be executed at the server side.
Fig. 2 schematically shows a flow chart of a document processing method according to an embodiment of the invention.
As shown in fig. 2, the document processing method may include steps S110 to S140.
In step S110, the product development document is parsed to obtain the content of the product development document, and a product risk correspondence table is generated based on the content of the product development document.
In step S120, the product clause document is parsed to obtain the content of the product clause document, and a text document is generated based on the content of the product clause document.
In step S130, field information is extracted from the text document, and a product clause information table is generated based on the field information.
In step S140, the product risk corresponding relationship table and the product clause information table are matched and combined to obtain a risk responsibility configuration table.
The method can match the contents of the product development document and the product clause document by analyzing the contents of the product development document and the product clause document, so that the working efficiency is improved, the error rate is reduced to a certain extent, and the policy information data of the client can be queried and processed through a structured query language based on the dangerous responsibility configuration table and is displayed to the user in a visual interface mode through a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee. For another example, when the client is a user, the visual interface can enable the client to clearly know all the current guarantee conditions, and the experience of the client on insurance service is improved.
In one embodiment of the invention, the product development document may be a development document for an insurance product. Typically, the product development document is a word document in which data of different formats is contained. The data in the different formats may be codes, tables, pictures, and text in a computer language. In this instance, text, i.e., the content of the product development document, can be extracted from the word document through step S110.
In one embodiment of the present invention, the content of the product development document may be text information in the product development document. For example, the risk code corresponds to the name, and the guarantee period class.
In one embodiment of the present invention, parsing a product development document to obtain content of the product development document includes: and reading, disassembling and/or assigning the product development document through the VBA to obtain the content of the product development document. VBA is a macro language of Visual Basic, which is known as Visual Basic for Applications.
In one embodiment of the present invention, the product risk correspondence table may include a risk code, a product name corresponding to the risk code, a guarantee period, and a category of insurance. Specifically as shown in table 1:
TABLE 1
In one embodiment of the invention, the product clause document may be a clause document of an insurance product. Typically, the clause document of the insurance product is a PDF document. Various insurance responsibilities are included in the clause document of the insurance product. Such as, but not limited to, major disease medical insurance benefits, major disease hospitalization benefits, physical insurance benefits, high residual insurance benefits, and the like.
In one embodiment of the invention, the content of the product clause document may be text information in the product clause document, based on which text document may be generated. The text information may be two text information in TXT format, one text in which all text information of the terms is stored, for example, the health percentage D major disease insurance terms of the a insurance limited liability company; the other TXT text encapsulates the text information with location information, e.g., < LTTextBox horizontal (0) 159,120, 729.744,441.225,782.250, health percentage type D major disease insurance clause of A insurance Limited company. In the TXT text with the location identifier, lttextbox horizontal (0) indicates that the text information is the first data read horizontally, four numbers 159.120,729.744,441.225,782.250 represent two coordinate points of the text information, coordinates of each page are set in the pdfminer, and the origin of coordinates is the lower left corner of each page, as shown in fig. 5.
In one embodiment of the present invention, parsing the product clause document to obtain the content of the product clause document includes: and analyzing the PDF format product clause document by using the Python to obtain the content of the product clause document. For example, the PDF formatted product clause document is parsed using the pdfminer module of Python. Because the pdfminer is analyzed according to the page format, not only the text information in the product clause document but also the position information of the characters in the text information can be obtained in the analysis process.
In one embodiment of the present invention, the text document may be an editable text document. Field information may be extracted from the text document and a product clause information table may be generated based on the field information. For example, the text document is imported into a text mining module, firstly, the risk class to which the product clause belongs is matched and marked, then the text mining module automatically selects an information extraction scheme according to the existence structure type of the required field under the risk class, extracts the value of the required field, and forms a clause information table, which comprises: the risk name, the risk class name, the responsibility name, the waiting period, the claim free amount, the beneficiary, the paying condition and the guarantee amount calculation factor are divided into three fields, namely a guarantee amount calculation factor a, a guarantee amount calculation factor b and a guarantee amount calculation factor c, wherein the calculation formula of the guarantee amount is a, a is b-c, and a is usually the standard of the calculation of the guarantee amount, such as the guarantee amount and the current premium; b is typically a multiple of a, if expressed in a 2-fold base guard, then b=2; c is typically a value within the total payable amount to be subtracted, such as other liabilities paid, survival funded, etc. In this example, the contents of the product clause information table are as shown in table 2:
TABLE 2
In one embodiment of the present invention, matching and combining the product risk correspondence table and the product clause information table to obtain a risk responsibility configuration table includes: and matching and combining the content of the product dangerous seed corresponding relation table and the content of the product clause information table according to the similarity of the dangerous seed in the product dangerous seed corresponding relation table and the dangerous seed in the product clause information table to obtain a dangerous seed responsibility configuration table. For example, the product risk corresponding relation table and the product clause information table are combined and matched according to the similarity of the risk names. In fact, the dangerous seed names in the two tables are not completely matched, for example, the dangerous seed names are abbreviations of clause names or meaning descriptions of dangerous seed names in a product dangerous seed corresponding relation table, such as auspicious cloud type C major disease insurance splitting light symptoms, and the dangerous seed names are the clause names of insurance industry standards, namely company+modifier+risk type+personal risk product design types, such as Tai Kang Xiangyun type C major disease insurance, the first few characters of the two fields are completely matched, and the matching of the dangerous seed names can be completed by judging whether the position information of co-occurring characters in the two fields is consistent or not, and the two table contents are combined. The dangerous seed responsibility configuration table is shown in table 3:
TABLE 3 Table 3
Dangerous seed code Responsibility code Name of responsibility a b c
0001 01 Duty of statue Paid premium 1 0
0002 02 Serious diseases Basic amount 1 0
Fig. 3 schematically shows a flow chart of a document processing method according to another embodiment of the invention.
As shown in fig. 3, the step S130 may specifically include a step S210 and a step S220.
In step S210, the text document is processed by regular expression rules to obtain a product clause name.
In step S220, field information is extracted from the text document according to the product clause name.
According to the method, the text document can be processed through the regular expression rule to obtain the product clause name, and the field information is extracted from the text document according to the product clause name, so that the field information can be extracted from the text document rapidly and accurately.
In one embodiment of the invention, a regular expression is a logical formula that operates on strings (including common characters (e.g., letters between a and z) and special characters (called "meta-characters")) by forming a "regular string" of predefined specific characters, and combinations of the specific characters, which is used to express a filtering logic for the string. A regular expression is a text pattern that describes one or more strings to be matched when searching text. In this example, a string may refer to text information in a text document. The "rule string" may refer to a regular expression rule, and the term name of the product may be obtained by processing the text document with the "rule string". For example, text content is matched by regular expression rules to obtain product clause names, such as: 'health-care life two full insurance clauses', and then judging whether the risk class name of the risk class dictionary is a substring in the product clause name or not, so that the risk class name can be obtained.
In one embodiment of the invention, field information is extracted from a text document according to product clause names. The field information may be an insurance class name, responsibility name, waiting period, claim amount, beneficiary, payment condition, calculation factor, and the like.
Fig. 4 schematically shows a flow chart of a document processing method according to another embodiment of the invention.
As shown in fig. 4, the step S220 may include steps S310 to S330.
In step S310, the risk class to which the product clause belongs in the text document is classified according to the product clause name.
In step S320, the field names required to be extracted by the product clauses are determined according to the risk class to which the product clauses belong.
In step S330, field information is extracted from the text document according to the field names required to be extracted for the product clauses.
According to the method, the risk class to which the product clause belongs in the text document can be classified according to the product clause name, and the field name required to be extracted by the product clause is determined according to the risk class to which the product clause belongs, so that the field information can be extracted from the text document according to the field name required to be extracted by the product clause, and the accuracy in extracting the field information is further improved.
In one embodiment of the invention, the risk class to which the product clause belongs may be determined from a risk type dictionary based on the product clause name. For example, the risk class name is obtained by determining whether the risk class name of the risk type dictionary is a substring in the product clause name.
In one embodiment of the invention, extracting field information from a text document according to a field name required to be extracted by a product term includes: the field information is extracted from the text document using regular expression rules, doc2vec, and/or location information of the text based on the field names required for the product clauses to be extracted.
In one embodiment of the invention, the required extraction fields of the risk class are structured differently. In field extraction, a multi-dimensional multi-level mixed information extraction model may be employed that may extract field information from a text document using regular expression rules, doc2vec, and/or location information of the text based on the field name required for extraction of the product clause. The multi-dimensional and multi-level mixed information extraction model refers to a mode of positioning a field not only according to single information, but also by mixing multiple methods, and simultaneously, the range of field values is narrowed by multiple positioning of different ranges of text information. The mixedness means that different extraction functions are set for different fields, and finally, the fields are obtained flexibly and accurately.
In particular, for fields such as 'waiting periods', regular expression rules can be written to extract field values from the full text due to their relatively single information structure.
Fuzzy matching of regular expressions to fields such as 'responsibility names' is not accurate. And (3) positioning the title of 'insurance responsibility' through the regular expression, and reducing the extraction range. Because responsibility has a similar structure in the PDF document, all responsibility names are positioned according to text position information obtained by PDF analysis, and all responsibility names are obtained.
For a field such as 'guarantee amount calculation factor a', the calculation factor a needs to be obtained from a payment logic text corresponding to a specific responsibility, for example, we pay a major disease insurance benefit to a disease insurance benefit person according to the basic insurance amount of the contract, and can obtain the calculation factor a as follows: a base insurance amount. Because of the flexibility of language, field values cannot be obtained well through rule information. The specific process of extracting the function is as follows: and positioning the text of the given logical paragraph according to the responsibility name and the position information, and splitting the text into short text according to punctuation marks. And then, carrying out natural language processing on the short text, namely firstly, word segmentation, wherein the word segmentation is based on a jieba module of Python, and uses insurance industry dictionary and self-built dictionary data, so that a word segmentation result is more accurate. The short text vectorization can be used for marking the needed target short text and the doc2vec model training by utilizing the doc2vec technology in the genesim library, and the short text with the field value can be obtained by calculating the similarity with the target short text through the model. And finally, carrying out syntactic structure analysis on the obtained short text, wherein the field values have the same part of speech and semantic roles, and accurately extracting the required field values by combining regular expression rules.
Fig. 6 schematically shows a block diagram of a document processing apparatus according to an embodiment of the present invention.
As shown in fig. 6, the document processing apparatus 600 includes a first parsing module 610, a second parsing module 620, an extracting module 630, and a matching merging module 640.
The first parsing module 610 is configured to parse a product development document to obtain contents of the product development document, and generate a product risk corresponding relationship table based on the contents of the product development document.
The second parsing module 620 is configured to parse the product term document to obtain contents of the product term document, and generate a text document based on the contents of the product term document.
An extraction module 630 is configured to extract field information from the text document and generate a product clause information table based on the field information.
And the matching and combining module 640 is configured to match and combine the product risk corresponding relationship table and the product clause information table to obtain a risk responsibility configuration table.
The document processing device 600 can match the contents of the product development document and the product clause document by analyzing the contents, so that the working efficiency is improved, the error rate is reduced to a certain extent, and the policy information data of the client can be queried and processed through a structured query language based on the dangerous responsibility configuration table and displayed to the user in a visual interface form through a certain device. For example, when the insurance agent is a user, the visual interface can enable the agent to intuitively see the missing condition of the customer guarantee, and improve the success rate of the customer guarantee. For another example, when the client is a user, the visual interface can enable the client to clearly know all the current guarantee conditions, and the experience of the client on insurance service is improved.
The document processing apparatus 600 may be used to implement the document processing method described in the embodiment of FIG. 2, according to an embodiment of the present invention.
In some embodiments of the present invention, the first parsing module 610 is configured to: and reading, disassembling and/or assigning the product development document through the VBA to obtain the content of the product development document.
In some embodiments of the present invention, the second parsing module 620 is configured to: and analyzing the PDF format product clause document by using the Python to obtain the content of the product clause document.
In some embodiments of the present invention, the matching merge module configuration 640 is: and matching and combining the content of the product dangerous seed corresponding relation table and the content of the product clause information table according to the similarity of the dangerous seed in the product dangerous seed corresponding relation table and the dangerous class in the product clause information table to obtain the dangerous seed responsibility configuration table.
Fig. 7 schematically shows a block diagram of a document processing apparatus according to another embodiment of the present invention.
As shown in fig. 7. The extraction module 630 may specifically include an acquisition module 710 and a first extraction module 720.
Specifically, the obtaining module 710 is configured to process the text document through regular expression rules to obtain a product clause name.
A first extraction module 720, configured to extract field information from the text document according to the product clause name.
The extraction module 630 may process the text document by regular expression rules to obtain the product clause name, and extract the field information from the text document according to the product clause name, so that the field information may be extracted from the text document quickly and accurately.
The extraction module 630 may be used to implement the document processing method described in the embodiment of FIG. 3, according to an embodiment of the present invention.
Fig. 8 schematically shows a block diagram of a document processing apparatus according to another embodiment of the present invention.
As shown in fig. 8, the first extraction module 720 may specifically include a classification module 810, a determination module 820, and a sub-module 830 of the first extraction module.
Specifically, the classification module 810 is configured to classify the risk class to which the product clause belongs in the text document according to the product clause name.
A determining module 820, configured to determine a field name that needs to be extracted by the product clause according to the risk class to which the product clause belongs.
A sub-module 830 of the first extraction module is configured to extract field information from the text document according to a field name required to be extracted by the product clause.
The first extraction module 720 can classify the risk class to which the product clause belongs in the text document according to the product clause name, and determine the field name required to be extracted by the product clause according to the risk class to which the product clause belongs, so that the field information can be extracted from the text document according to the field name required to be extracted by the product clause, and the accuracy in extracting the field information is further improved.
The first extraction module 720 may be used to implement the document processing method described in the embodiment of fig. 4, according to an embodiment of the present invention.
In some embodiments of the present invention, the submodule 830 of the first extraction module is configured to: extracting field information from the text document using regular expression rules, doc2vec, and/or location information of text based on field names required to be extracted by the product clauses.
It is understood that the first parsing module 610, the second parsing module 620, the extraction module 630, the matching merging module 640, the acquisition module 710, the first extraction module 720, the classification module 810, the determination module 820, and the sub-module 830 of the first extraction module may be merged in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the invention, at least one of the first parsing module 610, the second parsing module 620, the extraction module 630, the match combining module 640, the acquisition module 710, the first extraction module 720, the classification module 810, the determination module 820, and the sub-module 830 of the first extraction module may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging circuitry, or in any other suitable combination of hardware or firmware implementations. Alternatively, at least one of the first parsing module 610, the second parsing module 620, the extraction module 630, the matching merging module 640, the acquisition module 710, the first extraction module 720, the classification module 810, the determination module 820, and the sub-module 830 of the first extraction module may be at least partially implemented as a computer program module, which may perform the functions of the corresponding module when the program is run by a computer.
Referring now to FIG. 9, there is illustrated a schematic diagram of a computer system 900 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system 900 of the electronic device shown in fig. 9 is only an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 901.
The computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the document processing method as described in the above embodiments.
For example, the electronic device may implement the method as shown in fig. 2: in step S110, the product development document is parsed to obtain the content of the product development document, and a product risk correspondence table is generated based on the content of the product development document. In step S120, the product clause document is parsed to obtain the content of the product clause document, and a text document is generated based on the content of the product clause document. In step S130, field information is extracted from the text document, and a product clause information table is generated based on the field information. In step S140, the product risk corresponding relationship table and the product clause information table are matched and combined to obtain a risk responsibility configuration table.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present invention.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A document processing method, comprising:
analyzing a product development document to obtain the content of the product development document, and generating a product risk corresponding relation table based on the content of the product development document;
analyzing a product clause document to obtain the content of the product clause document, and generating a text document based on the content of the product clause document;
Extracting field information from the text document, and generating a product clause information table based on the field information; and
and matching and combining the product dangerous seed corresponding relation table and the product clause information table to obtain a dangerous seed responsibility configuration table.
2. The method of claim 1, wherein parsing the product development document to obtain content of the product development document comprises:
and reading, disassembling and/or assigning the product development document through the VBA to obtain the content of the product development document.
3. The method of claim 1, wherein the product term document is in a PDF format, and wherein parsing the product term document to obtain the content of the product term document comprises:
and analyzing the PDF format product clause document by using the Python to obtain the content of the product clause document.
4. The method of claim 1, wherein extracting field information from the text document comprises:
processing the text document through regular expression rules to obtain product clause names;
extracting field information from the text document according to the product clause name.
5. The method of claim 4, wherein extracting field information from the text document according to the product clause name comprises:
Classifying the risk class to which the product clause belongs in the text document according to the product clause name;
determining the field names required to be extracted by the product clauses according to the risk class to which the product clauses belong;
extracting field information from the text document according to the field names required to be extracted by the product clauses.
6. The method of claim 5, wherein extracting field information from the text document according to a field name required to be extracted by the product clause comprises:
extracting field information from the text document using regular expression rules and doc2vec based on field names required to be extracted by the product clauses;
and/or extracting field information from the text document using regular expression rules and location information of text based on the field names required to be extracted by the product clauses.
7. The method of claim 1, wherein matching and combining the product risk correspondence table and the product clause information table to obtain the risk responsibility configuration table comprises:
and matching and combining the content of the product dangerous seed corresponding relation table and the content of the product clause information table according to the similarity of the dangerous seed in the product dangerous seed corresponding relation table and the dangerous class in the product clause information table to obtain the dangerous seed responsibility configuration table.
8. A document processing apparatus, comprising:
the first analysis module is used for analyzing the product development document to obtain the content of the product development document and generating a product risk corresponding relation table based on the content of the product development document;
the second analysis module is used for analyzing the product clause document to obtain the content of the product clause document and generating a text document based on the content of the product clause document;
the extraction module is used for extracting field information from the text document and generating a product clause information table based on the field information; and
and the matching and combining module is used for matching and combining the product dangerous seed corresponding relation table and the product clause information table to obtain a dangerous seed responsibility configuration table.
9. An electronic device, comprising:
one or more processors; and
storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.
CN201911192868.5A 2019-11-28 2019-11-28 Document processing method, device, medium and electronic equipment Active CN111143505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911192868.5A CN111143505B (en) 2019-11-28 2019-11-28 Document processing method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911192868.5A CN111143505B (en) 2019-11-28 2019-11-28 Document processing method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111143505A CN111143505A (en) 2020-05-12
CN111143505B true CN111143505B (en) 2023-11-21

Family

ID=70517308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911192868.5A Active CN111143505B (en) 2019-11-28 2019-11-28 Document processing method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111143505B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270223A (en) * 2020-10-14 2021-01-26 招商银行股份有限公司 Policy viewing method, policy viewing device and computer-readable storage medium
CN112463931A (en) * 2020-12-11 2021-03-09 中国人寿保险股份有限公司 Intelligent analysis method for insurance product clauses and related equipment
CN113077353B (en) * 2021-04-22 2024-02-02 北京十一贝科技有限公司 Method, device, electronic equipment and medium for generating nuclear insurance conclusion
CN114792272B (en) * 2022-05-10 2024-02-23 北京华通互惠科技有限公司 Insurance product processing device, insurance product processing method, electronic equipment and storage medium
CN117521613A (en) * 2023-10-24 2024-02-06 中国人寿保险股份有限公司江苏省分公司 Method for generating insurance risk propaganda scheme

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016170469A (en) * 2015-03-11 2016-09-23 ニッセイ情報テクノロジー株式会社 Insurance application system, insurance application method and program
CN109035032A (en) * 2018-06-11 2018-12-18 中国平安人寿保险股份有限公司 Data structured processing method, device, computer equipment and storage medium
CN109344228A (en) * 2018-07-11 2019-02-15 深圳立安保险经纪有限公司 Declaration form data processing method, device, computer equipment and storage medium
CN109902288A (en) * 2019-01-17 2019-06-18 深圳壹账通智能科技有限公司 Intelligent clause analysis method, device, computer equipment and storage medium
CN110276054A (en) * 2019-05-16 2019-09-24 湖南大学 A kind of insurance text structure implementation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016170469A (en) * 2015-03-11 2016-09-23 ニッセイ情報テクノロジー株式会社 Insurance application system, insurance application method and program
CN109035032A (en) * 2018-06-11 2018-12-18 中国平安人寿保险股份有限公司 Data structured processing method, device, computer equipment and storage medium
CN109344228A (en) * 2018-07-11 2019-02-15 深圳立安保险经纪有限公司 Declaration form data processing method, device, computer equipment and storage medium
CN109902288A (en) * 2019-01-17 2019-06-18 深圳壹账通智能科技有限公司 Intelligent clause analysis method, device, computer equipment and storage medium
CN110276054A (en) * 2019-05-16 2019-09-24 湖南大学 A kind of insurance text structure implementation method

Also Published As

Publication number Publication date
CN111143505A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN111143505B (en) Document processing method, device, medium and electronic equipment
US10095780B2 (en) Automatically mining patterns for rule based data standardization systems
CN107330752B (en) Method and device for identifying brand words
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN111667923B (en) Data matching method and device, computer readable medium and electronic equipment
CN111651552A (en) Structured information determination method and device and electronic equipment
CN113139816A (en) Information processing method, device, electronic equipment and storage medium
CN113377958A (en) Document classification method and device, electronic equipment and storage medium
CN111753029A (en) Entity relationship extraction method and device
CN114444465A (en) Information extraction method, device, equipment and storage medium
US11881044B2 (en) Method and apparatus for processing image, device and storage medium
CN111027832A (en) Tax risk determination method, apparatus and storage medium
CN113657088A (en) Interface document analysis method and device, electronic equipment and storage medium
CN112989235A (en) Knowledge base-based internal link construction method, device, equipment and storage medium
US20230085684A1 (en) Method of recommending data, electronic device, and medium
CN111523309A (en) Medicine information normalization method and device, storage medium and electronic equipment
CN114118049B (en) Information acquisition method, device, electronic equipment and storage medium
CN111708819B (en) Method, apparatus, electronic device, and storage medium for information processing
CN114048315A (en) Method and device for determining document tag, electronic equipment and storage medium
CN113239273A (en) Method, device, equipment and storage medium for generating text
CN114298845A (en) Method and device for processing claim settlement bills
CN113806522A (en) Abstract generation method, device, equipment and storage medium
CN113469732A (en) Content understanding-based auditing method and device and electronic equipment
CN111833085A (en) Method and device for calculating price of article
CN111275476A (en) Logistics storage service quotation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant