CN115587582A - Notarization document template generation method and device, electronic equipment and storage medium - Google Patents

Notarization document template generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115587582A
CN115587582A CN202211355120.4A CN202211355120A CN115587582A CN 115587582 A CN115587582 A CN 115587582A CN 202211355120 A CN202211355120 A CN 202211355120A CN 115587582 A CN115587582 A CN 115587582A
Authority
CN
China
Prior art keywords
entity
notarization
chain
filled
sample book
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211355120.4A
Other languages
Chinese (zh)
Inventor
陈艳
许静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Faxin Gongzhengyun Xiamen Technology Co ltd
Original Assignee
Faxin Gongzhengyun Xiamen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faxin Gongzhengyun Xiamen Technology Co ltd filed Critical Faxin Gongzhengyun Xiamen Technology Co ltd
Priority to CN202211355120.4A priority Critical patent/CN115587582A/en
Publication of CN115587582A publication Critical patent/CN115587582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a notarization document template generation method and device based on a knowledge graph, electronic equipment and a storage medium, and relates to the technical field of computers. Wherein, the method comprises the following steps: acquiring a sample book of a notarization document, and carrying out named entity identification on the sample book to obtain a to-be-linked finger entity in the sample book; performing entity chain pointing on the entity to be chain pointed based on the notarization field knowledge map, and obtaining a candidate entity by the entity to be chain pointed; based on the individual case information to be filled corresponding to the sample book in the set to be filled, taking the candidate entity belonging to the individual case information to be filled as a target entity; and replacing the target entity with the corresponding parameter of the target entity in the sample book to generate a document template. The embodiment of the application solves the problem of low efficiency of writing the notarization documents in the related technology.

Description

Notarization document template generation method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to a notarization document template generation method and device based on a knowledge graph, an electronic device and a storage medium.
Background
Currently, in the notary sector, many notary transactions are done manually by notary personnel, for example, writing notary documents. Because of the numerous notarial affairs and messy types, notarization personnel need to invest a lot of energy to write notarization documents.
Moreover, because the notarization process has strict procedures and specifications, the corresponding notarization documents usually have the normative typesetting, formatting, content expression and other aspects of the literary requirements. Based on the paradigm characteristics of notarization documents, notarization personnel invest a great deal of energy and do repeated labor, which not only reduces efficiency, but also causes great waste of human resources.
From the above, it is known that the low efficiency of writing the official documents becomes a problem to be solved urgently.
Disclosure of Invention
Embodiments of the present application provide a method and an apparatus for generating a notarization document template based on a knowledge graph, an electronic device, and a storage medium, which can solve the problem of low notarization document writing efficiency in the related art. The technical scheme is as follows:
according to one aspect of the embodiment of the application, a notarization document template generation method based on a knowledge graph comprises the following steps: acquiring a sample book of a notarization document, and carrying out named entity identification on the sample book to obtain a to-be-linked finger entity in the sample book; performing entity chain pointing on the entity to be chain pointed based on the notarization field knowledge map, and obtaining a candidate entity from the entity to be chain pointed; based on the individual case information to be filled corresponding to the sample book in the set to be filled, taking the candidate entity belonging to the individual case information to be filled as a target entity; and replacing the target entity with the corresponding parameter of the target entity in the sample book to generate a document template.
According to an aspect of an embodiment of the present application, a notarization document template generation apparatus based on knowledge graph comprises: the entity identification module is used for acquiring a sample book of the notarization document, naming the sample book and identifying an entity to be linked to obtain an entity to be linked in the sample book; the entity chain indicating module is used for carrying out entity chain indicating on the entity to be chain indicated based on the notarization field knowledge map, and obtaining a candidate entity from the entity to be chain indicated; the confirming module is used for taking the candidate entity belonging to the individual case information to be filled as a target entity based on the individual case information to be filled corresponding to the sample book in the set to be filled; and the template generating module is used for replacing the target entity with the corresponding parameters of the target entity in the sample book to generate a notarization document template.
According to an aspect of an embodiment of the present application, an electronic device includes: the system comprises at least one processor, at least one memory and at least one communication bus, wherein the memory is stored with computer programs, and the processor reads the computer programs in the memory through the communication bus; the computer program, when executed by a processor, implements the notary document template generation method as described above.
According to an aspect of an embodiment of the present application, a storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the notary document template generation method as described above.
The beneficial effect that technical scheme that this application provided brought is:
in the technical scheme, firstly, a sample book of the notarization document is obtained, entity-to-be-linked finger entities in the sample book are obtained through named entity identification, entity-to-be-linked finger entities are subjected to entity-to-be-linked finger in a notarization field knowledge map, candidate entities are obtained, parts of the candidate entities, which belong to information to be filled, are determined through a set to be filled, the parts are replaced by parameters, a notarization document template corresponding to the sample book is generated, and filling contents are adaptively filled in parameter positions in the notarization document template according to specific cases when the subsequent notarization document is manufactured, so that the generation of the notarization document can be completed, and the problem of low writing efficiency of the notarization document existing in the related technology can be effectively solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
FIG. 1 is a schematic illustration of an implementation environment according to the present application;
FIG. 2 is a flow diagram illustrating a method for knowledge-graph-based notary document template generation in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram for one embodiment of the steps preceding step 310 in the corresponding embodiment of FIG. 2;
FIG. 4 is a flow diagram for one embodiment of the steps preceding step 330 in the corresponding embodiment of FIG. 2;
FIG. 5 is a schematic illustration of a sample case of a notary document shown in accordance with an exemplary embodiment;
FIG. 6 is a flow diagram for one embodiment of the steps following step 330 in the corresponding embodiment of FIG. 2;
FIG. 7 is a flowchart of one embodiment of the steps following step 330 in the corresponding embodiment of FIG. 2;
FIG. 8 is a flowchart of steps following step 370 in a corresponding embodiment of FIG. 2 for one embodiment;
FIG. 9 is a schematic illustration of a notary document template shown in accordance with an exemplary embodiment;
FIG. 10 is a diagram illustrating an implementation of a method for generating a notarization document template based on knowledge graph in an application scenario;
FIG. 11 is a block diagram illustrating an arrangement of a knowledge-graph-based notary document template generating device in accordance with an exemplary embodiment;
FIG. 12 is a hardware block diagram of a server shown in accordance with an exemplary embodiment;
fig. 13 is a block diagram illustrating a configuration of an electronic device according to an example embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
As mentioned above, due to the normal form of the official document, the official document staff must invest a lot of energy and do repetitive labor, resulting in low efficiency of writing the official document.
In recent years, the artificial intelligence technology assists various industries with high singing and leaps into the intelligent era, and the exclusive notarization industry can not take advantage of the east wind to realize intelligent transformation. Therefore, many notary affairs still need to be completed manually and mentally by notary, for example, writing notary documents.
Writing a notarization document is an indispensable work in the daily business of notarization personnel, and the notarization document refers to process files and result files generated throughout the whole process of the notarization affairs, including but not limited to interview notes, notification books, acceptance notice books, notarization and the like. Because of the numerous and varied notarization operations, notarization personnel typically invest a great deal of effort in the writing of these notarization documents. Thus, notary must invest a great deal of time and effort in writing notary documents.
In addition, notary documents play an extremely important role in judicial activities and daily certification activities, and especially notarization with specific judicial certification effectiveness is considered an important category of notary documents. Therefore, the official document with a formal specification and legal and effective contents is the most important responsibility of the official institution. However, the quality of the manually written notarization documents depends entirely on the level of business of the notarization personnel and the strict and complete auditing mechanism and auditing capability of the notarization body. To increase the level of writing of notary documents, notary institutions have to devote more resources in training the notary's ability to write. Even in this case, the stability of the quality of the official documents cannot be ensured.
Therefore, if the dependence on manpower is reduced when writing the notary document, the writing of the notary document is intelligentized, so that the efficiency of writing the notary document is greatly improved, the writing quality of the notary document is improved, and the waste of human resources is reduced.
From the above, the related art still has the defect of low efficiency of writing the notarization documents.
Therefore, the method for generating the notarization document template based on the knowledge graph can effectively improve the efficiency of writing the notarization document, and accordingly, the notarization document template generating method is suitable for the notarization document template generating device, the information recommending device can be deployed on an electronic device, and the electronic device can be a computer device with a Von Neumann architecture, for example, the computer device can be a desktop computer, a notebook computer, a server and the like.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an implementation environment related to a notary document generation method. It should be noted that this implementation environment is only one example adapted to the present invention and should not be considered as providing any limitation to the scope of use of the present invention.
The implementation environment includes a collection side 110 and a server side 130.
Specifically, the collection terminal 110 may be an electronic device having a function of collecting at least one or more data of pictures, texts, and multimedia, and is not limited in this respect.
The server 130 may be an electronic device such as a desktop computer, a notebook computer, a server, or the like, or may be a computer device cluster formed by multiple servers, or even a cloud computing center formed by multiple servers. The server 130 is configured to provide a background service, for example, the background service includes, but is not limited to, a notarization document template generation service, and the like.
The server 130 and the collection terminal 110 are connected in advance through wired or wireless network communication, and data transmission between the server 130 and the collection terminal 110 is realized through the network communication connection. The data transmitted includes, but is not limited to: a sample of a notary document, and so on.
Through the interaction between the acquisition end 110 and the server end 130, the acquisition end 110 sends the sample of the notarization document to the server end 130, and the server end 130 processes the obtained sample of the notarization document by combining the knowledge graph, so that a notarization document template can be generated.
Referring to fig. 2, an embodiment of the present application provides a method for generating a notarization document template based on a knowledge graph, which is applicable to an electronic device, which may be the server 130 in the implementation environment shown in fig. 1.
In the following method embodiments, for convenience of description, the main execution subject of each step of the method is taken as an electronic device for illustration, but the method is not particularly limited to this configuration.
As shown in fig. 2, the method may include the steps of:
and 310, acquiring a sample book of the notarization document, and performing named entity identification on the sample book to obtain a to-be-linked finger entity in the sample book.
The sample book of the notarization document refers to a process file and a result file generated throughout the whole process of the notarization transaction, and includes but is not limited to an interview record, a notification book, a reception notice book, a notarization certificate and the like. That is, the notarization document's sample books involved in the notarization transaction vary depending on the type of legal relationship of the entity.
With respect to named entity recognition, it is meant that words with specific naming directions are recognized from a sample book, and the words are to-be-linked referring entities in the sample book, such as names of people, places, legal documents, organizations and the like. The named entity identification may be implemented by a rule-based method, an unsupervised learning method, a supervised learning method, a deep learning method, and the like, which is not limited herein.
In one possible implementation, a named entity prediction model generated by neural network model training is called to perform named entity recognition on a sample book of a notarization document. The steps for constructing the named entity prediction model are shown in FIG. 3.
Step 3101, obtain historical notarization documents.
Wherein, the historical notarization documents include but are not limited to: electronic document data of notarized paper roll documents obtained by Optical Character Recognition (OCR) techniques, and structured data obtained by a notarized services platform database.
At step 3103, a corpus is constructed.
Specifically, unstructured data (namely, electronic document data) acquired by the OCR technology is converted into structured data, and a corpus is formed by combining the structured data in the notarization service platform database.
Step 3105, according to the labeling rule, generating the labeled notarization text.
Specifically, a marking rule is configured, the historical notarization documents in the corpus are marked based on the marking rule, and marked notarization texts are generated.
The labeling rule may be a BIO labeling rule (B-prefix of named entity, I-non-prefix of named entity, O-non-named entity), or a BIO labeling rule (B-prefix of named entity, I-non-prefix of named entity, O-non-named entity, E-suffix of named entity, S-independent word), etc., which is not limited herein. Taking the BIO labeling rule as an example, if the house belongs to Jia, it is labeled "B (the) I (house) O (in) B (Jia) I (some)".
Step 3107, the notarization text after marking is coded with one hot, and input into the embedding layer.
Specifically, the marked notarization text is subjected to unique hot coding, and then a unique hot vector is converted into a character vector through a character embedding matrix embedded into an embedding layer.
At step 3109, the character vectors are input into the BilSTM module.
Specifically, the character vector is used as the input of a bidirectional long-short-time memory BilSTM module so as to start the training of the BilSTM model.
Step 3111, processing the label scores output by the BilSTM module, and inputting the processed label scores into a CRF module to obtain a parameter-initialized named entity prediction model.
Specifically, after various marking scores output by the BilSTM module are processed, the marking scores are firstly input into a CRF module; and then obtaining an optimal solution of a tagging sequence output by the BilSTM module through the CRF module according to the relation between adjacent part-of-speech tags, namely obtaining a parameter-initialized named entity prediction model.
And 3113, carrying out hyper-parameter optimization to obtain a named entity prediction model.
Specifically, in the training of the BilSTM-CRF model, a drop-drop technique is adopted, namely a part of neurons are randomly dropped during each training, and the dropped neurons do not influence propagation, so that the network does not depend on a neuron weight change method too much to adjust parameters, and the overfitting problem is relieved.
And collecting and storing the optimal parameters obtained by each training, and packaging the BiLSTM-CRF model passing the test into a named entity prediction model.
After the training process is completed, the named entity prediction model has the named entity recognition capability suitable for the notarization documents, so that named entity recognition can be performed on the sample documents of the notarization documents, and the entities to be linked in the sample documents are obtained.
And 330, performing entity link pointing on the entity to be linked and pointed based on the notarization field knowledge graph, and obtaining a candidate entity from the entity to be linked and pointed.
Wherein, the notarization field knowledge map is used for describing concepts and mutual relations in the notarization field. Before the entity chain is referred to, a notary domain knowledge graph needs to be created.
In one possible implementation, as shown in FIG. 4, creating a notary domain knowledge graph prior to step 330 may include the steps of:
and 3301, acquiring the historical notarization document, and performing named entity recognition on the historical notarization document to obtain a plurality of training entities.
As mentioned above, the named entity recognition can be performed on the historical notarization documents in the corpus by methods such as rule-based, unsupervised learning, supervised learning and deep learning, and the obtained recognition results are a plurality of training entities.
In one possible implementation, a named entity prediction model is called to identify named entities in a historical notarization document to obtain a plurality of training entities.
And 3303, extracting the association relationship between the training entities.
The association relationship may be implemented by a rule-based relationship extraction method, a predefined relationship type, deep learning, and the like, which is not limited herein.
In one possible implementation manner, the association relationship among a plurality of training entities is extracted based on a rule-based relationship extraction method.
First, it is noted that the training entities can be obtained through step 3301, and can also be obtained by using a notary entity dictionary to perform word segmentation on historical notary documents in the corpus. The notarization entity dictionary is constructed in advance by carrying out named entity recognition on the historical notarization documents in the corpus through the named entity prediction model obtained through training in the step 3113. Similarly, the notarization legal dictionary can also perform named entity recognition and pre-construction on the legal documents related to the notarization (such as the national common and national folk law convention) through the named entity prediction model, and then perform word segmentation on the legal documents related to the notarization by using the notarization legal dictionary to obtain a plurality of training entities of which the association relationship is to be extracted.
Secondly, a number of training entities are part-of-speech tagged by NLP (Natural Language Processing) text tagging tools (e.g., doccano part-of-speech tagging tools). Furthermore, the training entities are also subjected to part-of-speech tagging according to tagging rules set by notarization-related laws and regulations, judicial interpretation and the like.
Then, the "national official certification comprehensive management information system technical code" is used to configure the "relationship extraction rule", as shown in table 1.
TABLE 1 relationship extraction rules
Figure BDA0003919757100000081
And finally, extracting the incidence relation among the training entities subjected to part-of-speech tagging by using a pattern matching method based on the relation extraction rule. The pattern matching is a basic operation of character strings in a data structure, and at least one substring which is the same as the substring is required to be found in a certain character string given one substring. For example, as shown in table 1, if a substring "belongs" is given, and if the substring "belongs" exists in a character string formed by a plurality of training entities which are part-of-speech labeled in a historical notarization document, the association relationship among the plurality of training entities is considered as an affiliation relationship.
And 3305, storing the training entities in the nodes, and using the association relationship between the training entities as the path connecting the adjacent nodes.
First, it is explained that the notarization field knowledge graph includes nodes and paths, where the nodes are used to store training entities and attribute information of the training entities, such as format attributes, and the paths are used to connect adjacent nodes and store association relations between adjacent nodes (i.e. between training entities), such as "belong to" and "yes".
When an association relationship exists between two nodes, a path can be constructed based on the association relationship, and then the two nodes are connected. For example, the association relationship between the training entity "Li Ming" and "House" is that the house belongs to Li Ming, and a path is constructed for the two to obtain "Li Ming < -belongs to-House".
And 3307, constructing a notarization field knowledge graph from the nodes and paths.
And extracting the association relation among the nodes, constructing a path for each node based on the association relation, and connecting the nodes to obtain the notarization field knowledge graph. In one possible implementation, the notary domain knowledge graph is represented by a Neo4j graph database, so that the training entities and the association relationship are stored in the Neo4j graph database and displayed, and the method is not limited herein.
After the notarization field knowledge map is obtained, the candidate entity can be obtained from the entity to be chain-pointed in an entity chain-pointing mode.
With respect to entity chain reference, it refers to the operation of linking the to-be-chained entities to entities in the notarization domain knowledge graph.
In one possible implementation, the entity chain can be implemented by determining the similarity between the entity to be linked and the entity in the notarization field knowledge graph. In one possible implementation, the entity chain may also be implemented by performing entity disambiguation and/or reference resolution on less similar entities to be linked with entities in the notarization domain knowledge graph.
For example: the entity to be linked in the sample book comprises a house, and the notary field knowledge graph also has an entity house, so that the entity house to be linked can be linked to the entity house in the notary field knowledge graph, namely, the entity house to be linked can be used as a candidate entity.
And 350, taking the candidate entity belonging to the individual case information to be filled as the target entity based on the individual case information to be filled corresponding to the sample books in the set to be filled.
The collection to be filled comprises individual case information to be filled in the sample books of various types of notarization documents.
Before confirming candidate entities belonging to case information to be filled, a list of 'case information to be filled', namely a set to be filled, in a notarization document needs to be configured in advance aiming at different notarization transaction scenes.
In one possible implementation manner, the individual case information to be filled in each type of sample book is extracted based on a set rule, and a set to be filled is generated. The various types of sample books include a statement, a principal, a contractual certificate, and the like, and are not limited herein.
Regarding the setting rules, in a complete notarization document, information that the corresponding parts do not change when cases are different and information that the corresponding parts change adaptively as cases change are included, and the information is regarded as individual case information to be filled. Taking the claims as an example, the entities that may be involved include: the applicant and the notarization office word number may also relate to the legal document "the common general rules of the people's republic of China" cited in the subject text, for the notarization affairs of "statement", the applicant and the notarization office word number belong to the case information to be filled, while the legal document "the common general rules of the people's republic of China" does not need to be changed and does not belong to the case information to be filled. As shown in fig. 5, wherein the case information to be filled is represented by at least one character X.
Regarding the candidate entities belonging to the case information to be filled, firstly, a set to be filled corresponding to the type of the sample book is determined, for example, the sample book is a statement, and the set to be filled of the statement corresponds to the sample book. And comparing the candidate entity in the sample book with the individual case information to be filled corresponding to the sample book in the set to be filled, and further determining the candidate entity belonging to the individual case information to be filled.
Step 370, in the sample book, the target entity is replaced by the corresponding parameter of the target entity, and a notarization document template is generated.
And after the target entities are determined, replacing the target entities in the sample books with the corresponding parameters of the target entities to obtain the notarization document template corresponding to the sample books.
Through the process, a sample book of the notarization document is obtained, the entity to be linked in the sample book is obtained through named entity identification, entity link pointing is carried out on the entity to be linked in the notarization field knowledge map, a candidate entity is obtained, the part, belonging to information to be filled, in the candidate entity is determined through a set to be filled, the part is replaced by parameters, a notarization document template corresponding to the sample book is generated, and when a subsequent notarization worker processes a notarization transaction, filling contents are filled in the corresponding parameter position of the notarization document template according to specific cases, writing of the notarization document is completed, so that writing of the notarization document is more intelligent, time and energy of the notarization document writing are reduced, and the problem of low efficiency of writing the notarization document in related technologies can be effectively solved.
It can be understood that if candidate entities corresponding to the entities to be linked and indicated exist in the notarization field knowledge graph, entity linking and indication are successful; and if the candidate entity corresponding to the entity to be linked does not exist, the entity link is failed.
The following describes in detail the processing of the to-be-linked finger entity when the entity link finger fails:
in an exemplary embodiment, as shown in fig. 6, after step 330, the method may further include the steps of:
if the entity chain indicates failure, entity disambiguation and/or reference resolution is performed on the entity to be chained.
The entity disambiguation is used for disambiguating the entity to be chained by combining the context when the entity to be chained has multiple meanings. For example, the entity "apple" to be linked may correspond to "apple company", "apple cell phone", "apple (fruit)", etc., and in the sentence "1 ton of apples picked in XX farm belong to li ming", the real meaning of the entity "apple" to be linked is "apple (fruit)", as can be known by combining the context.
The reference resolution is used for merging a plurality of reference items into a correct to-be-chained finger entity when the plurality of reference items point to the same to-be-chained finger entity. For example, in "Li Ming presents house with Zhang III, \8230;" the recipient needs to handle the registration procedure before xx month xx day, \8230; "the recipient" refers to "Zhang III", so two references can be merged into a chain-waiting for the entity "the recipient".
And 430, aiming at the entity to be chained and subjected to the completion and/or the reference resolution, performing entity chaining and referring on the notarization field knowledge graph to obtain a candidate entity.
And (3) carrying out entity chain pointing again on the to-be-linked pointing entity which completes entity disambiguation and/or finger resolution, wherein if a candidate entity corresponding to the to-be-linked pointing entity exists in the notarization field knowledge map, the entity chain pointing is successful, and if the candidate entity corresponding to the to-be-linked pointing entity does not exist, the entity chain pointing fails.
After the entity disambiguation and/or the resolved entity chain finger fails again, steps 510 to 530 can be further performed to further process the to-be-chained entity after the entity chain finger fails.
In an exemplary embodiment, as shown in fig. 7, after step 330, the method may further include the steps of:
step 510, if the entity chain indicates a failure, a new node is created in the notary domain knowledge graph.
Step 530, the entity chain to be linked is referred to a new node of the notarization field knowledge graph.
When the entity link fails, it indicates that there is no candidate entity corresponding to the entity to be linked in the notarization field knowledge graph, and the entity to be linked cannot be linked to the node of the notarization field knowledge graph.
Under the cooperation of the embodiment, after the entity link finger fails, entity disambiguation and/or finger resolution can be performed on the entity to be linked, entity linking is performed again, a new node can also be created in the notarization field knowledge map, and the candidate node is linked to the new node, so that the success rate of obtaining the candidate entity from the entity to be linked through the entity link finger is increased, the candidate entity is further used for determining case information to be filled in subsequent steps, and a notarization document template is generated, so that the success rate of the entity link finger is increased, and the quality of the notarization document template is improved.
Referring to fig. 8, in an exemplary embodiment, after step 370, the method may further include the steps of:
step 610, extracting the format attribute of the target entity based on the notarization field knowledge graph.
As described above, the target entity is stored in the node of the notarization field knowledge graph, and the node also stores the attribute information of the target entity, so that the format attribute of the target entity is extracted from the node corresponding to the target entity to be filled with case information.
The format attribute may include a regular expression of the target entity for expressing the filtering logic for the character string. For example: the regular expression of the e-mail box is ^ w + ([ + ] \ w +), \ w + ([ - ] \ w +); the regular expression of the phone number is "[ 13 ], [0-9], [ 14 ], [57], [ 15 ], [012356789], [ 18 ], [012356789], [ d {8} $"; the regular expression of the date is ^ d {4} - \ d { l,2} - \ d { l,2} "and the like. Therefore, the regular expression in the format attribute can be used for configuring the check rule for the parameter.
Step 630, according to the format attribute, configuring a check rule for the parameter replaced by the target entity.
The verification rule is used for carrying out format verification on the filling content replacing the parameters in the notarization document template. By configuring the check rule, the normalization and the correctness of the subsequent filling content can be restricted.
In one possible implementation, as shown in fig. 9, the regular expression of the target entity is utilized to configure the check rule for the replaced parameter. If the regular expression requires that the filling content of the corresponding parameter in the official document template is replaced by money (number), and the filling content filled in the corresponding parameter position by the subsequent notary is characters, at the moment, the characters do not accord with the requirement of the regular expression, the electronic equipment can identify the error, and the error is reported to remind the notary.
Under the effect of the embodiment, based on the notarization field knowledge map, the format attribute is extracted from the candidate entity belonging to the case information to be filled, the verification rule is configured for the parameters of the notarization document template, when the filling content filled in the corresponding parameter position of the subsequent notarization document template is not in accordance with the requirement of the verification rule, the electronic equipment can report errors to remind the notarization staff, the errors in the notarization document writing can be intelligently identified, the accuracy of the notarization document writing is improved, the quality reduction of the notarization document caused by negligence of the notarization staff and the notarization institution is avoided, the stability of the notarization document quality is improved, and the mutual identification of the notarization document in the region is facilitated.
Fig. 10 is a schematic diagram of a specific implementation of a method for generating a notarization document template based on a knowledge graph in an application scenario.
Through step 701, a named entity prediction model is obtained, and a notary field knowledge graph is constructed.
The named entity prediction model is formed by training a neural network model based on deep learning.
Through step 703, a set to be filled is configured, where the set to be filled includes individual case information to be filled of various types of sample books.
A sample document of the notary document is obtained, via step 705. The sample book can be a paper official document or an electronic official document.
The to-be-linked entities in the sample book are identified and obtained based on the named entity prediction model, via step 707.
Through step 709, an entity chaining finger is performed on the to-be-chained finger entity based on the notary domain knowledge graph.
If the entity chain in step 709 indicates success, then a candidate entity is obtained.
If the entity chain in step 709 indicates a failure, step 711 is executed.
Through step 711, entity disambiguation and/or entity chain referring is performed on the to-be-chained referred entity, and secondary entity chain referring is performed on the to-be-chained referred entity which completes the entity disambiguation and/or the reference resolution.
If the entity chain in step 711 indicates success, then a candidate entity is obtained.
If the entity chain in step 711 indicates a failure, step 713 is performed.
Through step 713, a new node is created in the notarization field knowledge graph, and the to-be-linked finger entity is linked to the new node to be considered as a candidate entity.
Based on the set to be populated, the candidate entities belonging to the case information to be populated are taken as target entities and the target entities are replaced with the corresponding parameters of the target entities, via step 717.
Through step 719, based on the format attribute of the target entity extracted from the notary domain knowledge graph, a verification rule is configured for the parameter replaced by the target entity. The verification rule is used for carrying out format verification on the filling content replacing the parameters in the notarization document template.
A notary document template corresponding to the sample book is generated, via step 721.
In one possible implementation, after the notary document template is generated, the notary document template may be stored or may be presented.
In the application scenario, a sample book of the notarization document is named and identified through a named entity prediction model to obtain an entity to be linked, then a candidate entity is obtained from the entity to be linked through an entity link finger based on a notarization field knowledge map, then the candidate entity belonging to the individual case information to be filled is determined through a set to be filled, the candidate entity is replaced by a parameter to configure format attributes for the parameter, and finally a notarization document template is output. On the one hand, through notarization document template, can reduce notarization person and consume time and energy on writing notarization document, improve notarization document's the efficiency of writing, on the other hand, through configuration format attribute, carry out format recognition to notarization person's filling content of filling parameter position, reduced notarization document content's mistake, improved notarization document quality's stability.
The following are embodiments of the apparatus of the present application, which may be used to implement a method for generating a notarization document template based on knowledge graph according to the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method for generating the notary document template referred to in the present application.
Referring to fig. 11, in an embodiment of the present application, a notarization document template generating apparatus 900 based on knowledge graph is provided, including but not limited to: an entity identification module 910, an entity chain finger module 930, a confirmation module 950, and a template generation module 970.
The entity identifying module 910 is configured to obtain a sample book of the notarization document, perform named entity identification on the sample book, and obtain a to-be-linked entity in the sample book.
And the entity chain finger module 930 is configured to perform entity chain finger on the to-be-chain finger entity based on the notarization field knowledge graph, and obtain a candidate entity from the to-be-chain finger entity.
The confirming module 950 is configured to take the candidate entity belonging to the individual case information to be filled as the target entity based on the individual case information to be filled corresponding to the sample book in the set to be filled.
The template generating module 970 is configured to replace the target entity with the corresponding parameter of the target entity in the sample book, and generate a document template.
It should be noted that, when the notary document template generating device provided in the above embodiment generates a notary document template, only the division of the above functional modules is taken as an example, in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the notary document template generating device is divided into different functional modules to complete all or part of the above described functions.
In addition, the notary document template generating apparatus provided in the above embodiment and the notary document template generating method belong to the same concept, wherein specific ways of executing operations by the respective modules have been described in detail in the method embodiments, and are not described herein again.
FIG. 12 illustrates a structural schematic of a server in accordance with an exemplary embodiment. The server is suitable for use in the server 130 in the implementation environment shown in fig. 1.
It should be noted that the server is only an example adapted to the application and should not be considered as providing any limitation to the scope of use of the application. The server should not be interpreted as having to rely on or have to have one or more components of the exemplary server 2000 illustrated in fig. 12.
The hardware structure of the server 2000 may be greatly different due to the difference of configuration or performance, as shown in fig. 12, the server 2000 includes: a power supply 210, an interface 230, at least one memory 250, and at least one Central Processing Unit (CPU) 270.
Specifically, the power supply 210 is used to provide operating voltages for the various hardware devices on the server 2000.
The interface 230 includes at least one wired or wireless network interface 231 for interacting with external devices. For example, the interaction between the acquisition end 110 and the service end 130 in the implementation environment shown in fig. 1 is performed.
Of course, in other examples of the present application, the interface 230 may further include at least one serial-to-parallel conversion interface 233, at least one input/output interface 235, at least one USB interface 237, and the like, as shown in fig. 12, which is not limited thereto.
The storage 250 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resources stored thereon include an operating system 251, an application 253, data 255, etc., and the storage manner may be a transient storage or a permanent storage.
The operating system 251 is used for managing and controlling each hardware device and the application 253 on the server 2000, so as to implement the operation and processing of the mass data 255 in the memory 250 by the central processing unit 270, which may be Windows server, mac OS XTM, unix, linux, freeBSDTM, and the like.
The application 253 is a computer program that performs at least one specific task on top of the operating system 251, and may include at least one module (not shown in fig. 12), each of which may include a computer program for the server 2000. For example, the notary document template generating means may be considered as an application 253 deployed on the server 2000.
Data 255 may be photographs, pictures, etc. stored in disk, or may be historical notary documents, etc. stored in memory 250.
The central processor 270 may include one or more processors and is configured to communicate with the memory 250 through at least one communication bus to read the computer programs stored in the memory 250, and further implement operations and processing on the mass data 255 in the memory 250. The notary document template generation method is accomplished, for example, by the central processor 270 reading a form of a series of computer programs stored in the memory 250.
Furthermore, the present application can be implemented by hardware circuits or by hardware circuits in combination with software, and therefore, the implementation of the present application is not limited to any specific hardware circuits, software, or a combination of the two.
Referring to fig. 13, an embodiment of the present application provides an electronic device 4000, where the electronic device 4000 may include: desktop computers, notebook computers, servers, and the like.
In fig. 13, the electronic device 4000 includes at least one processor 4001, at least one communication bus 4002, and at least one memory 4003.
Processor 4001 is coupled to memory 4003, such as via communication bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.
Communication bus 4002 may include a path that carries information between the aforementioned components. The communication bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
A computer program is stored in the memory 4003, and the processor 4001 reads the computer program stored in the memory 4003 through the communication bus 4002.
The computer program realizes the notarization document template generating method in the above embodiments when executed by the processor 4001.
In addition, in the embodiments of the present application, a storage medium is provided, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the notarization document template generating method in the above embodiments.
A computer program product is provided in an embodiment of the present application, the computer program product comprising a computer program stored in a storage medium. The processor of the computer device reads the computer program from the storage medium, and the processor executes the computer program, so that the computer device executes the notarization document template generating method in the above embodiments.
Compared with the related technology, firstly, a sample book of the notarization document is obtained, an entity to be linked in the sample book is obtained through named entity identification, entity link pointing is carried out on the entity to be linked in the notarization field knowledge graph to obtain a candidate entity, a part of the candidate entity belonging to information to be filled is determined through a set to be filled, the part is replaced by a parameter, a check rule is configured for the parameter, and a notarization document template corresponding to the sample book is generated. And when a subsequent notarization operator processes a notarization affair, filling content in the parameter position of the notarization document template according to a specific case, and completing format verification on the filling content according to a verification rule to complete writing of the notarization document. The method and the device have the advantages that the official document template is intelligently generated, the errors of filling contents in the parameter positions are intelligently identified, the time and the energy of a notary clerk writing the official document are reduced, the certificate issuing quality of the notary document is improved, and the problem that the notary document writing efficiency is low in the related technology can be effectively solved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. A notarization document template generation method based on knowledge graph is characterized by comprising the following steps:
acquiring a sample book of a notarization document, and carrying out named entity identification on the sample book to obtain a to-be-linked finger entity in the sample book;
performing entity chain pointing on the entity to be chain pointed based on the notarization field knowledge map, and obtaining a candidate entity from the entity to be chain pointed;
based on the individual case information to be filled corresponding to the sample book in the set to be filled, taking the candidate entity belonging to the individual case information to be filled as a target entity;
and replacing the target entity with the corresponding parameter of the target entity in the sample book to generate a document template.
2. The method of claim 1, wherein after replacing the target entity with the corresponding parameters of the target entity in the sample document and generating a notary document template, the method further comprises:
extracting the format attribute of the target entity based on the notarization field knowledge graph;
and configuring a check rule for the parameter replaced by the target entity according to the format attribute, wherein the check rule is used for carrying out format check on the filling content replacing the parameter in the document template.
3. The method of claim 1, wherein the performing entity chain pointing on the to-be-chain-pointed entity based on the notary domain knowledge graph, and obtaining candidate entities from the to-be-chain-pointed entity comprises:
if the entity chain finger fails, carrying out entity disambiguation and/or finger resolution on the entity to be chain finger;
and aiming at the entities to be chained and referred to after the entity disambiguation and/or the referred resolution are completed, the entity chaining is carried out on the notarization field knowledge graph to obtain the candidate entities.
4. The method of claim 1, wherein the performing entity chain pointing on the to-be-chain-pointed entity based on the notary domain knowledge graph, and obtaining candidate entities from the to-be-chain-pointed entity comprises:
if the entity chain indicates failure, a new node is created in the notarization field knowledge graph;
and pointing the entity chain to be linked to a new node of the notarization field knowledge graph.
5. The method of claim 1, wherein before determining the candidate entities belonging to the case information to be filled based on the case information to be filled corresponding to the sample book in the set to be filled, the method further comprises:
and extracting the individual case information to be filled in the various types of sample books based on a set rule to generate the set to be filled.
6. The method of any one of claims 1 to 5, wherein before performing entity chain pointing on the to-be-chained entity based on a notary domain knowledge graph, the method further comprises:
acquiring a historical notarization document, and performing named entity recognition on the historical notarization document to obtain a plurality of training entities;
extracting the incidence relation among the training entities;
respectively storing each training entity in each node, and respectively using the incidence relation between each training entity as a path for connecting adjacent nodes;
and constructing the notarization field knowledge graph according to each node and each path.
7. The method of claim 6, in which the named entity recognition is implemented by invoking a named entity prediction model generated by neural network model training.
8. A notarization document template generation device based on knowledge graph, comprising:
the entity identification module is used for acquiring a sample book of the notarization document, naming the sample book and identifying an entity to be linked to obtain an entity to be linked in the sample book;
the entity chain indicating module is used for carrying out entity chain indicating on the entity to be chain indicated based on the notarization field knowledge map, and obtaining a candidate entity from the entity to be chain indicated;
the confirming module is used for confirming the candidate entity belonging to the individual case information to be filled based on the individual case information to be filled corresponding to the sample book in the set to be filled;
and the template generating module is used for replacing the target entity with the corresponding parameter of the target entity in the sample book to generate a notarization document template.
9. An electronic device, comprising: at least one processor, at least one memory, and at least one communication bus, wherein,
the memory has a computer program stored thereon, and the processor reads the computer program in the memory through the communication bus;
the computer program, when executed by the processor, implements the notary document template generation method of any of claims 1 to 7.
10. A storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the notary document template generation method as claimed in any of claims 1 to 7.
CN202211355120.4A 2022-11-01 2022-11-01 Notarization document template generation method and device, electronic equipment and storage medium Pending CN115587582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211355120.4A CN115587582A (en) 2022-11-01 2022-11-01 Notarization document template generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211355120.4A CN115587582A (en) 2022-11-01 2022-11-01 Notarization document template generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115587582A true CN115587582A (en) 2023-01-10

Family

ID=84781102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211355120.4A Pending CN115587582A (en) 2022-11-01 2022-11-01 Notarization document template generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115587582A (en)

Similar Documents

Publication Publication Date Title
US11615246B2 (en) Data-driven structure extraction from text documents
US11720615B2 (en) Self-executing protocol generation from natural language text
CN112711953B (en) Text multi-label classification method and system based on attention mechanism and GCN
CN108153729B (en) Knowledge extraction method for financial field
US20210366055A1 (en) Systems and methods for generating accurate transaction data and manipulation
US11830269B2 (en) System for information extraction from form-like documents
US20190139147A1 (en) Accuracy and speed of automatically processing records in an automated environment
CN115687647A (en) Notarization document generation method and device, electronic equipment and storage medium
CN111651994B (en) Information extraction method and device, electronic equipment and storage medium
US20220319216A1 (en) Image reading systems, methods and storage medium for performing geometric extraction
WO2022247231A1 (en) Resume screening method, resume screening apparatus, terminal device and storage medium
CN113902569A (en) Method for identifying the proportion of green assets in digital assets and related products
US11893008B1 (en) System and method for automated data harmonization
CN115525739A (en) Supply chain financial intelligent duplicate checking method, device, equipment and medium
CN115587582A (en) Notarization document template generation method and device, electronic equipment and storage medium
CN111046934B (en) SWIFT message soft clause recognition method and device
Kumar et al. AI Enabled Invoice Management Application
US20230385701A1 (en) Artificial intelligence engine for entity resolution and standardization
US20240330605A1 (en) Generative artificial intelligence platform to manage smart documents
US12013838B2 (en) System and method for automated data integration
US20240330742A1 (en) Artificial intelligence platform to manage a document collection
CN115587800A (en) Notarization document error correction method and device, electronic device and storage medium
Oliveira et al. Sentiment analysis of stock market behavior from Twitter using the R Tool
KR20240013679A (en) Method and system for constructing knowledge base and extracting entity name relationship using knowledge base
Detection et al. 17 Emails Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination