CN117033539A - Judge document element extraction method, device, equipment and storage medium - Google Patents

Judge document element extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN117033539A
CN117033539A CN202311006774.0A CN202311006774A CN117033539A CN 117033539 A CN117033539 A CN 117033539A CN 202311006774 A CN202311006774 A CN 202311006774A CN 117033539 A CN117033539 A CN 117033539A
Authority
CN
China
Prior art keywords
original
title
notice
reported
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311006774.0A
Other languages
Chinese (zh)
Inventor
黄威威
张晗
邹伟东
许树淮
包智
洪英文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qizhi Technology Co ltd
Original Assignee
Qizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qizhi Technology Co ltd filed Critical Qizhi Technology Co ltd
Priority to CN202311006774.0A priority Critical patent/CN117033539A/en
Publication of CN117033539A publication Critical patent/CN117033539A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a judge document element extraction method, a device, equipment and a storage medium, wherein the method comprises the following steps: responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document; inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title; extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; and generating key elements of the judge document according to the original report information and the reported information. The application has the technical effects that: when processing new data sets or fields, the rules do not need to be updated and modified, and the workload is reduced.

Description

Judge document element extraction method, device, equipment and storage medium
Technical Field
The application relates to the technical field of information extraction, in particular to a judge document element extraction method, a device, equipment and a storage medium.
Background
The enterprise risk information query contains the intellectual property related risk, and the intellectual property risk information mainly originates from key information of the intellectual property referee document, such as original notice title, original notice name, original notice address, notice title, notice name, notice address and the like.
At present, the key information of the referee document is generally extracted by adopting a rule extraction mode, and the key information can be accurately extracted by defining and writing rules in advance by rule extraction.
However, for rule writing, domain experts are usually required to write, and the dependency makes rule extraction methods require continuous updating and modification of rules when processing new data sets or domains, so that the workload is high.
Disclosure of Invention
The application provides a judge document element extraction method, device, equipment and storage medium, which are used for reducing workload without updating and modifying rules when processing a new data set or field.
In a first aspect, the present application provides a method for extracting referee document elements, the method comprising: responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document; inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title; extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; and generating key elements of the judge document according to the original report information and the reported information.
By adopting the technical scheme, the original title and the reported title in the referee document are extracted through the entity extraction model, then the prompt template is constructed through the original title and the reported title, and the key information in the referee document is extracted through the prompt template, so that the whole process does not need too much manual participation, does not need domain experts to write, does not need to update and modify rules even when a new data set or domain is processed, has strong applicability, and can be suitable for extracting key elements of all referee documents.
Optionally, before the entity extraction model is called to extract the original title and the reported title of the referee document in response to the element extraction request of the referee document sent by the user side, the method further includes: and acquiring first training data input by the user terminal, and constructing the entity extraction model according to the first training data.
By adopting the technical scheme, the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the referee document can be accurately extracted through the entity extraction model, and key elements in the referee document can be accurately extracted.
Optionally, the first training data includes data with labels, the obtaining the first training data input by the user side, and constructing the entity extraction model according to the first training data includes: acquiring the marked data input by the user side, wherein the marked data comprises marked original titles and marked notice titles; invoking a preset extraction model, and extracting the noted original title and the noted title in the noted-attached data to obtain the types of the original title and the noted title; and constructing the entity extraction model according to the types of the original title and the reported title.
By adopting the technical scheme, the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the referee document can be accurately extracted through the entity extraction model, and key elements in the referee document can be accurately extracted.
Optionally, after the obtaining the first training data input by the user side and constructing the entity extraction model according to the first training data, the method further includes: acquiring second training data input by the user side, and extracting the second training data through the entity extraction model to obtain a training result; the training result is sent to the user side, and first feedback information sent by the user side is received; and adjusting the entity extraction model according to the first feedback information.
By adopting the technical scheme, after the entity training model is constructed, the entity training model is trained through the second training data input by the user side, and then the first feedback information sent by the user side is received, so that the finally obtained entity training model is more suitable for the user side, and the original title and the reported title in the referee document are accurately extracted through the entity extraction model, so that the key elements in the referee document are accurately extracted.
Optionally, after the element extraction request of the referee document sent by the user side is responded, the method further includes: acquiring a preset template; judging whether the judge document meets preset requirements or not according to the preset template; if the requirement is met, calling the entity extraction model to extract the original title and the reported title of the judge document; if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling the entity extraction model to extract the original title and the reported title of the new judge document.
By adopting the technical scheme, whether the judge document meets the preset requirement or not can be judged through the preset module, and the judge document is processed according to whether the judge document meets the preset requirement or not, so that the entity extraction model can easily extract original titles and reported titles from the judge document, and the entity extraction model can accurately extract the original titles and the reported titles in the judge document, thereby accurately extracting key elements in the judge document.
Optionally, the generating key elements of the referee document according to the original report information and the reported information includes: and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the referee document.
By adopting the technical scheme, the open source model is called to process the original notice information and the notice information, so that the text information can be generated, excessive manual participation is not needed in the whole process, domain experts are not needed to write, updating and modifying rules are not needed even when a new data set or domain is processed, the applicability is strong, and the method is applicable to the extraction of key elements of all judge documents.
Optionally, the key elements of the referee document include original report key elements and reported key elements, and the generating key elements of the referee document according to the original report information and the reported information includes: extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address; extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
By adopting the technical scheme, key elements in the judge document can be extracted according to the needs of a user, excessive manual participation is not needed in the whole process, field experts are not needed to write, updating and modifying rules are not needed even when a new data set or field is processed, and the method is high in applicability and applicable to the extraction of the key elements of all judge documents.
In a second aspect, the present application provides a referee document element extracting apparatus comprising: the system comprises a calling module 1, a first generating module 2, an extracting module 3 and a second generating module 4; the calling module 1 is used for responding to an element extraction request of the referee document sent by a user side, and calling an entity extraction model to extract an original title and a reported title of the referee document; the first generation module 2 is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title; the extraction module 3 is used for extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; the second generation module 4 generates key elements of the referee document according to the original report information and the reported information.
By adopting the technical scheme, the original title and the reported title in the referee document are extracted through the entity extraction model, then the prompt template is constructed through the original title and the reported title, and the key information in the referee document is extracted through the prompt template, so that the whole process does not need too much manual participation, does not need domain experts to write, does not need to update and modify rules even when a new data set or domain is processed, has strong applicability, and can be suitable for extracting key elements of all referee documents.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme: the system comprises a processor, a memory, a user interface and a network interface, wherein the memory is used for storing instructions, the user interface and the network interface are used for communicating with other devices, and the processor is used for executing the instructions stored in the memory so as to enable the electronic device to execute the computer program of any judge document element extraction method.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical solutions: a computer program capable of being loaded by a processor and executing any one of the judge document element extraction methods described above is stored.
In summary, the present application includes at least one of the following beneficial technical effects:
1. the whole process does not need excessive manual participation, does not need field expert writing, does not need updating and modifying rules even when processing a new data set or field, has strong applicability, and can be suitable for extracting key elements of all judge documents;
2. the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the judge document can be accurately extracted through the entity extraction model, and key elements in the judge document can be accurately extracted.
Drawings
FIG. 1 is a schematic flow chart of a method for extracting referee document elements according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for extracting referee document elements according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals illustrate: 1. calling a module; 2. a first generation module; 3. an extraction module; 4. a second generation module; 1000. an electronic device; 1001. a processor; 1002. a communication bus; 1003. a user interface; 1004. a network interface; 1005. a memory.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments.
In describing embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "illustrative," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "illustratively," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.
The enterprise risk information inquiry contains the intellectual property related risk, while the intellectual property risk information mainly comes from key information of the intellectual property referee document, such as original notice title, original notice name, original notice address, notice title, notice name and notice address, and key elements in the referee document can be extracted through the information extraction technology, so that a related report is generated, and the purpose of preventing risk is achieved.
Intellectual property referee documents have many elements, such as: case name, case number, judgment date, original notice title, original notice name, original notice address, notice title, notice name, notice address, and date of filing. Wherein the original title, the title of the notice may be represented in the referee document in a variety of ways, such as: "complaints", "review report", "original report", "complaint", "reported" and the like. At present, when six elements of original notice title, original notice name, original notice address, notice title, notice name and notice address are extracted, an extraction scheme is mainly adopted by rule extraction or relation extraction model, but rule extraction may have the problem of low extraction recall rate, and the relation extraction model needs to construct a complex model, needs a large amount of labeling data and needs to consume a large amount of resources. In order to efficiently and accurately extract the elements in the referee document, the patent provides a method for extracting the elements of the referee document.
According to the method, the prompt learning is used for fine adjustment of the pre-training model, so that the accuracy of element extraction is improved relative to rule extraction; compared with relation-based extraction, the method reduces the labeling sample size and reduces the complexity of an extraction model.
Fig. 1 is a flow chart of a method for extracting referee document elements according to an embodiment of the present application. It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows; the steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders; and at least some of the steps in fig. 1 may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least some of the other steps or sub-steps of other steps.
The application discloses a judge document element extraction method, which comprises S101-S104 as shown in figure 1.
S101, responding to an element extraction request of the referee document sent by a user side, and calling an entity extraction model to extract an original title and a reported title of the referee document.
In one example, for the element extraction request of the referee document sent by the user side, the extracted elements are generally related to the original notice and the reported notice, so that entity extraction models corresponding to the original notice and the reported notice can be set, of course, the extraction models are not limited to the extraction of the elements related to the original notice and the reported notice, and can be set according to the individual description, and the application is illustrated by the original notice and the reported notice for convenience of description.
When the user side sends an element extraction request of the referee document, the system calls an entity extraction model to extract original titles and reported titles in the referee document sent by the user side; in general, the element extraction of the referee document is usually performed by using a physical extraction model, and the physical extraction model is usually obtained by training a large amount of data.
For example, in order to extract the original title and the reported title in the referee document, since the original title and the reported title may have different names in different referee documents, the original title may have a name of "complaint person", "first-order complaint person", "original report" and "original report", and the like, and similarly, the reported title may have a plurality of names; thus, an entity extraction model is trained for accurately extracting the names of the original title and the reported title in the referee document. The training entity extraction model usually adopts marking of a plurality of referee documents, and the number of marked referee documents can be set by oneself; marking original titles and reported titles in a plurality of referee documents by a manual marking method, and training by using a bert+crf extraction model to obtain an entity extraction model. When the user side sends the element request of the judge document, the original title and the reported title in the judge document are extracted through the entity extraction model. Such as: the text data is "in the patent infringement disputes of the approval original notice XXX limited company, the complaint XXX limited company of the XX city, the XX limited company" and the original notice "are extracted as original notice titles and the notice" is the notice title through the entity extraction model ".
The bert+crf extraction model uses bert as a feature extractor, and the input text sequence is converted into semantic representation through a pre-trained bert model. Then, the output of the bert is taken as input, and the prediction and optimization of the tag sequence are performed in combination with the crf layer. The model is learned through training data such that the model can identify entities in the text and classify them into predefined entity categories. The bert+crf extraction model has better performance in NER tasks, can fully utilize the dependency relationship between the context information and the labels, and improves the accuracy and the robustness of entity identification.
S102, inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title.
In one example, when the original title and the reported title sent by the user side are extracted, the names of the original title and the reported title are input into a preset template, where the preset template may be set by the user, and after the original title and the reported title are input into the preset template, a prompt template corresponding to the current original title and the reported title is generated. For example: the names of the original notice titles may be "complaint", "first-level complaint", "original notice" and "original notice", etc., and the names of the notice titles may be "notice", "notice" etc.; the original notice title and the notice title may be different in different referee documents, so that a prompt template needs to be set according to the names of the original notice title and the notice title in the referee document, and the action of the prompt template can be understood as the function of the guiding system for extracting corresponding information from the referee document; for example, if the original notice title in the current referee document is "complaint" and the notice title is "notice" are extracted through the entity extraction model, the constructed prompting template may be divided into two boards, and one board is the original notice board, specifically "who is the complaint in the referee document? What is the address of the complaint's XXX company? If the original report title in the referee document is original report, the presentation template is "who is the original report in the referee document? What is the address of the original XXX company? "; the other board is a board to be informed, specifically, "who is the informed in the referee document? Is the location of the advertised XXX company address? If the title of the report in the referee document is the report, the report template is "who is the report in the referee document? Is the location of the interviewee's XXX company address? ".
S103, extracting original report information and reported information in the referee document according to the prompt content of the prompt template.
In one example, after the alert template is constructed, the system may extract the original and the reported information in the referee document according to the alert template. For example, after obtaining the prompt template, the system outputs a corresponding answer according to the prompt template, where the corresponding answer may be "complaint is XXX," address of XXX company of complaint is XXX "and" is reported as XXX, address of XXX company is XXX, etc. Generally, the content of the prompting template depends on the content finally output, the content wanted by the user can be obtained through the prompting template, the user can set the content of the template by himself, the corresponding content in the referee document is extracted, and redundant description is omitted.
S104, generating key elements of the judge document according to the original report information and the report information.
In one example, the original information generally includes, but is not limited to, an original title, an original name, and an original address; the notice information generally includes, but is not limited to, a notice title, a notice name, and a notice address; the original title, original name and original address, and the title, name and address are key elements of the judge document. Of course, different key elements may exist for different referee documents by different clients, and the key elements may be set according to personal needs.
Generally, the system extracts all information about original notices and notices in the judge document according to the prompting template, screens key information from all the information about the original notices and the notices, outputs the screened information through a corresponding open source model to obtain text information required by a user, and invokes an interface of the open source model to output a result after the prompting template extracts the key information in the judge document. The open source model is generally selected from models such as T5 and promptCLUE, GPT, a plurality of samples are generally extracted from the judge document at random, the original title and the notice title are obtained according to the steps, a prompt template is constructed based on the original title and the notice title, and the prompt template and the judge document text are spliced to be used as input of the open source model, so that the model effect in comparison with the open source model is better.
The open source models of T5, promptCLUE, GPT, etc. are explained below. T5 (Text-To-Text Transfer Transformer), simpleCLUE and GPT (generating Pre-trained Transformer) are all Pre-training models T5 is a transform-based Pre-training model for various natural language processing tasks. The natural language processing tasks are unified into the text-to-text conversion problem, and the multi-task learning is realized through large-scale unsupervised pre-training and supervised fine tuning. The promtcue is a general Chinese natural language understanding evaluation standard, and provides a series of data sets and evaluation indexes of Chinese natural language processing tasks, including tasks such as text classification, named entity recognition, relation extraction and the like. By using the promptCLUE, researchers and developers can make fair comparisons and evaluations of different models. GPT (generating Pre-trained Transformer) is a transducer-based Pre-trained model for generating text. The GPT model is pre-trained from a large-scale text corpus by unsupervised learning and can be used to automatically generate text, such as articles, conversations, etc. These open source models play an important role in the field of natural language processing.
Responding to an element extraction request of a referee document sent by a user side, and before an entity extraction model is called to extract original titles and reported titles of the referee document, the method further comprises the following steps: and acquiring first training data input by the user terminal, and constructing an entity extraction model according to the first training data.
In an example, the entity extraction model is usually obtained through a large amount of data training, and the training method can refer to the above embodiment, and the entity extraction model of the present application also has a learning function, and may have a case that the entity extraction model fails to extract the original title and/or the reported title in the current referee document, and at this time, the entity extraction model may store the original title and the reported title in the referee document, so that the entity extraction model can be directly invoked when encountering the same original title and the same reported title.
The first training data comprises data with labels, the first training data input by the user terminal is obtained, and an entity extraction model is constructed according to the first training data, and the method comprises the following steps: the method comprises the steps of obtaining marked data input by a user side, wherein the marked data comprise marked original titles and marked notice titles; invoking a preset extraction model, and extracting marked original titles and marked notice titles in the marked data to obtain the types of the original titles and the notice titles; and constructing the entity extraction model according to the original title and the type of the title to be advertised.
In an example, the entity extraction model is generally constructed by training a large amount of data, where the first training data is data for training the entity extraction model, and the method for constructing the entity model may refer to the above embodiment and will not be described in detail herein. It should be noted that, since the entity extraction model is generally used for extracting the original report title and the report title, when the entity extraction model is trained by using the first training data, only the original report title and the report title in the referee document are labeled, so as to obtain different kinds of names of the original report title and the report title in different referee documents, and the entity extraction model is constructed according to the different kinds of names of the original report title and the report title.
Acquiring first training data input by a user terminal, and constructing an entity extraction model according to the first training data, wherein the method further comprises the following steps: acquiring second training data input by a user side, and extracting the second training data through an entity extraction model to obtain a training result; transmitting a training result to a user side and receiving first feedback information transmitted by the user side; and adjusting the entity extraction model according to the first feedback information.
In one example, after the entity extraction model is obtained, whether the current entity extraction model meets the requirement or not is checked, and whether the current entity extraction model can be put into use or not is also required, at the moment, a preset number of referee documents are required to be selected, whether the entity extraction model can accurately extract original titles and reported titles in the referee documents is judged, and if the current judgment result meets the requirement, adjustment is not required, and the current entity extraction model can be put into use directly; if a great amount of errors exist in the current judgment result or the errors are higher than a preset value, training or adjustment is needed again to obtain a new entity extraction model. The judgment result is in accordance with the requirements, the extraction result can be sent to related staff, and the related staff judges the extraction result and judges whether the extraction result is in accordance with the requirements.
After responding to the element extraction request of the referee document sent by the user side, the method further comprises the following steps: acquiring a preset template; judging whether the judge document meets the preset requirements or not according to the preset template; if the requirement meets the preset requirement, calling a entity extraction model to extract the original title and the reported title of the judge document; if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling an entity extraction model to extract the original title and the notice title of the new judge document.
In one example, after the user side sends the element extraction request of the referee document, it is determined whether the referee document to be extracted at present meets the preset requirement, and since there may be slight differences between different referee documents, the referee document needs to be preprocessed before the element extraction of the referee document, and the text is cleaned by removing the stop word, illegal symbol, format, and the like. Here, by setting a content template, different user sides can set different content templates according to personal needs, after the referee document sent by the user is filled according to the content template, appropriate modification can be performed during filling, and if the current referee document does not meet the filling requirement or the content does not meet the preset template, appropriate modification can also be performed or error information can be sent to the user side, so that the user side can adjust. And finally, calling a entity extraction model to extract original titles and reported titles in the judge document.
Generating key elements of the referee document according to the original report information and the reported information, including: and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the judge document.
In an example, the processing method may refer to the above embodiment specifically, and the open source model may process the required information into text information, so as to facilitate the user side to view.
The key elements of the judge document comprise original notice key elements and notice key elements, and the key elements of the judge document are generated according to the original notice information and the notice information, and the key elements comprise: extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address; extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
In one example, key elements of the referee document include, but are not limited to, original key elements and notified key elements, and can be specifically set according to actual situations. After the user sets the required key information, the system automatically extracts three elements in the original report information, wherein the three elements are the key information, specifically comprise an original report name and an original report address, and the original report key elements are generated by combining the original report title, the original report name and the original report address; similarly, three elements in the notice information, namely a notice name and a notice address are extracted, and a notice key element is generated by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
Based on the above method, the application also discloses a referee document element extracting device, as shown in fig. 2, and fig. 2 is a schematic structural diagram of a referee document element extracting device according to an embodiment of the application.
A referee document element extracting apparatus comprising: the system comprises a calling module 1, a first generating module 2, an extracting module 3 and a second generating module 4; the calling module 1 is used for responding to an element extraction request of the referee document sent by the user side, and calling the entity extraction model to extract an original title and a reported title of the referee document; the first generation module 2 is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title; the extraction module 3 is used for extracting original notice information and reported information in the judge document according to the prompt content of the prompt template; the second generation module 4 generates key elements of the referee document according to the original report information and the report information.
It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.
Referring to fig. 3, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 3, the electronic device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.
Wherein the communication bus 1002 is used to enable connected communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the entire server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.
The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 3, an operating system, a network communication module, a user interface module, and an application program of a referee document element extraction method may be included in a memory 1005 as a computer storage medium.
In the electronic device 1000 shown in fig. 3, the user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 1001 may be configured to invoke an application program in memory 1005 that stores a referee document element extraction method that, when executed by one or more processors, causes the electronic device to perform the method as described in one or more of the above embodiments.
An electronic device readable storage medium storing instructions. When executed by one or more processors, cause an electronic device to perform the method as described in one or more of the embodiments above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. The judge document element extraction method is characterized by being applied to a server, and comprises the following steps:
responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document;
inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title;
extracting original notice information and notice information in the referee document according to the prompt content of the prompt template;
and generating key elements of the judge document according to the original report information and the reported information.
2. The method for extracting referee document elements according to claim 1, wherein before the entity extraction model is called to extract original titles and reported titles of the referee document in response to the request for extracting elements of referee document sent by the user terminal, the method further comprises:
and acquiring first training data input by the user terminal, and constructing the entity extraction model according to the first training data.
3. The method for extracting referee document elements according to claim 2, wherein the first training data includes labeled data, the obtaining the first training data input by the user terminal, and constructing the entity extraction model according to the first training data includes:
acquiring the marked data input by the user side, wherein the marked data comprises marked original titles and marked notice titles;
invoking a preset extraction model, and extracting the noted original title and the noted title in the noted-attached data to obtain the types of the original title and the noted title;
and constructing the entity extraction model according to the types of the original title and the reported title.
4. The method for extracting referee document elements according to claim 2, wherein the step of obtaining the first training data input by the user terminal, and constructing the entity extraction model according to the first training data, further comprises:
acquiring second training data input by the user side, and extracting the second training data through the entity extraction model to obtain a training result;
the training result is sent to the user side, and first feedback information sent by the user side is received;
and adjusting the entity extraction model according to the first feedback information.
5. The referee document element extraction method according to claim 1, further comprising, after the response to the element extraction request of the referee document sent by the user side:
acquiring a preset template;
judging whether the judge document meets preset requirements or not according to the preset template;
if the requirement is met, calling the entity extraction model to extract the original title and the reported title of the judge document;
if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling the entity extraction model to extract the original title and the reported title of the new judge document.
6. The method of claim 1, wherein the generating key elements of the referee document based on the original report information and the reported information comprises:
and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the referee document.
7. The method for extracting key elements of a referee document according to claim 1, wherein the key elements of the referee document include original key elements and reported key elements, and the generating key elements of the referee document based on the original information and the reported information includes:
extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address;
extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address;
and combining the original key elements and the reported key elements to generate key elements of the judge document.
8. A referee document element extracting apparatus, comprising: the device comprises a calling module (1), a first generating module (2), an extracting module (3) and a second generating module (4); wherein,
the calling module (1) is used for responding to an element extraction request of the referee document sent by the user side, and calling the entity extraction model to extract the original title and the reported title of the referee document;
the first generation module (2) is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title;
the extraction module (3) is used for extracting original notice information and notice information in the referee document according to the prompt content of the prompt template;
the second generation module (4) generates key elements of the referee document according to the original report information and the reported information.
9. An electronic device comprising a processor (1001), a memory (1005), a user interface (1003) and a network interface (1004), the memory (1005) being configured to store instructions, the user interface (1003) and the network interface (1004) being configured to communicate to other devices, the processor (1001) being configured to execute the instructions stored in the memory to cause the electronic device to perform the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that a computer program is stored which can be loaded by a processor and which performs the method according to any of claims 1-7.
CN202311006774.0A 2023-08-10 2023-08-10 Judge document element extraction method, device, equipment and storage medium Pending CN117033539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311006774.0A CN117033539A (en) 2023-08-10 2023-08-10 Judge document element extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311006774.0A CN117033539A (en) 2023-08-10 2023-08-10 Judge document element extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117033539A true CN117033539A (en) 2023-11-10

Family

ID=88601766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311006774.0A Pending CN117033539A (en) 2023-08-10 2023-08-10 Judge document element extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117033539A (en)

Similar Documents

Publication Publication Date Title
US20230206087A1 (en) Techniques for building a knowledge graph in limited knowledge domains
US8522195B2 (en) Systems and methods to generate a software framework based on semantic modeling and business rules
US20110153292A1 (en) Framework to populate and maintain a service oriented architecture industry model repository
US20220100963A1 (en) Event extraction from documents with co-reference
CN110598070B (en) Application type identification method and device, server and storage medium
US20220100772A1 (en) Context-sensitive linking of entities to private databases
CN116127020A (en) Method for training generated large language model and searching method based on model
US20220284171A1 (en) Hierarchical structure learning with context attention from multi-turn natural language conversations
CN116680093A (en) LLM-based web application optimization system and service implementation method and system
US20220253288A1 (en) Natural solution language
US11645452B2 (en) Performance characteristics of cartridge artifacts over text pattern constructs
CN105335466A (en) Audio data retrieval method and apparatus
Hsu Extending UML to model Web 2.0‐based context‐aware applications
US20220100967A1 (en) Lifecycle management for customized natural language processing
WO2022072237A1 (en) Lifecycle management for customized natural language processing
El Beggar et al. CIM for data warehouse requirements using an UML profile
CN116701604A (en) Question and answer corpus construction method and device, question and answer method, equipment and medium
CN116702746A (en) Cross-platform multi-theme irony and cause identification method, device, equipment and medium
CN116978028A (en) Video processing method, device, electronic equipment and storage medium
CN117033539A (en) Judge document element extraction method, device, equipment and storage medium
KR20230059364A (en) Public opinion poll system using language model and method thereof
CN113239670A (en) Method and device for uploading service template, computer equipment and storage medium
CN113761931A (en) Information processing method, device, electronic equipment and storage medium
Hoi et al. Manipulating Data Lakes Intelligently with Java Annotations
CN117075900B (en) Method and system for generating h5 page content based on AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination