CN117033539A - Judge document element extraction method, device, equipment and storage medium - Google Patents
Judge document element extraction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117033539A CN117033539A CN202311006774.0A CN202311006774A CN117033539A CN 117033539 A CN117033539 A CN 117033539A CN 202311006774 A CN202311006774 A CN 202311006774A CN 117033539 A CN117033539 A CN 117033539A
- Authority
- CN
- China
- Prior art keywords
- original
- title
- notice
- reported
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000012549 training Methods 0.000 claims description 58
- 230000008569 process Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 11
- 230000000694 effects Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services; Handling legal documents
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The application provides a judge document element extraction method, a device, equipment and a storage medium, wherein the method comprises the following steps: responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document; inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title; extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; and generating key elements of the judge document according to the original report information and the reported information. The application has the technical effects that: when processing new data sets or fields, the rules do not need to be updated and modified, and the workload is reduced.
Description
Technical Field
The application relates to the technical field of information extraction, in particular to a judge document element extraction method, a device, equipment and a storage medium.
Background
The enterprise risk information query contains the intellectual property related risk, and the intellectual property risk information mainly originates from key information of the intellectual property referee document, such as original notice title, original notice name, original notice address, notice title, notice name, notice address and the like.
At present, the key information of the referee document is generally extracted by adopting a rule extraction mode, and the key information can be accurately extracted by defining and writing rules in advance by rule extraction.
However, for rule writing, domain experts are usually required to write, and the dependency makes rule extraction methods require continuous updating and modification of rules when processing new data sets or domains, so that the workload is high.
Disclosure of Invention
The application provides a judge document element extraction method, device, equipment and storage medium, which are used for reducing workload without updating and modifying rules when processing a new data set or field.
In a first aspect, the present application provides a method for extracting referee document elements, the method comprising: responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document; inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title; extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; and generating key elements of the judge document according to the original report information and the reported information.
By adopting the technical scheme, the original title and the reported title in the referee document are extracted through the entity extraction model, then the prompt template is constructed through the original title and the reported title, and the key information in the referee document is extracted through the prompt template, so that the whole process does not need too much manual participation, does not need domain experts to write, does not need to update and modify rules even when a new data set or domain is processed, has strong applicability, and can be suitable for extracting key elements of all referee documents.
Optionally, before the entity extraction model is called to extract the original title and the reported title of the referee document in response to the element extraction request of the referee document sent by the user side, the method further includes: and acquiring first training data input by the user terminal, and constructing the entity extraction model according to the first training data.
By adopting the technical scheme, the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the referee document can be accurately extracted through the entity extraction model, and key elements in the referee document can be accurately extracted.
Optionally, the first training data includes data with labels, the obtaining the first training data input by the user side, and constructing the entity extraction model according to the first training data includes: acquiring the marked data input by the user side, wherein the marked data comprises marked original titles and marked notice titles; invoking a preset extraction model, and extracting the noted original title and the noted title in the noted-attached data to obtain the types of the original title and the noted title; and constructing the entity extraction model according to the types of the original title and the reported title.
By adopting the technical scheme, the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the referee document can be accurately extracted through the entity extraction model, and key elements in the referee document can be accurately extracted.
Optionally, after the obtaining the first training data input by the user side and constructing the entity extraction model according to the first training data, the method further includes: acquiring second training data input by the user side, and extracting the second training data through the entity extraction model to obtain a training result; the training result is sent to the user side, and first feedback information sent by the user side is received; and adjusting the entity extraction model according to the first feedback information.
By adopting the technical scheme, after the entity training model is constructed, the entity training model is trained through the second training data input by the user side, and then the first feedback information sent by the user side is received, so that the finally obtained entity training model is more suitable for the user side, and the original title and the reported title in the referee document are accurately extracted through the entity extraction model, so that the key elements in the referee document are accurately extracted.
Optionally, after the element extraction request of the referee document sent by the user side is responded, the method further includes: acquiring a preset template; judging whether the judge document meets preset requirements or not according to the preset template; if the requirement is met, calling the entity extraction model to extract the original title and the reported title of the judge document; if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling the entity extraction model to extract the original title and the reported title of the new judge document.
By adopting the technical scheme, whether the judge document meets the preset requirement or not can be judged through the preset module, and the judge document is processed according to whether the judge document meets the preset requirement or not, so that the entity extraction model can easily extract original titles and reported titles from the judge document, and the entity extraction model can accurately extract the original titles and the reported titles in the judge document, thereby accurately extracting key elements in the judge document.
Optionally, the generating key elements of the referee document according to the original report information and the reported information includes: and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the referee document.
By adopting the technical scheme, the open source model is called to process the original notice information and the notice information, so that the text information can be generated, excessive manual participation is not needed in the whole process, domain experts are not needed to write, updating and modifying rules are not needed even when a new data set or domain is processed, the applicability is strong, and the method is applicable to the extraction of key elements of all judge documents.
Optionally, the key elements of the referee document include original report key elements and reported key elements, and the generating key elements of the referee document according to the original report information and the reported information includes: extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address; extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
By adopting the technical scheme, key elements in the judge document can be extracted according to the needs of a user, excessive manual participation is not needed in the whole process, field experts are not needed to write, updating and modifying rules are not needed even when a new data set or field is processed, and the method is high in applicability and applicable to the extraction of the key elements of all judge documents.
In a second aspect, the present application provides a referee document element extracting apparatus comprising: the system comprises a calling module 1, a first generating module 2, an extracting module 3 and a second generating module 4; the calling module 1 is used for responding to an element extraction request of the referee document sent by a user side, and calling an entity extraction model to extract an original title and a reported title of the referee document; the first generation module 2 is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title; the extraction module 3 is used for extracting original notice information and notice information in the referee document according to the prompt content of the prompt template; the second generation module 4 generates key elements of the referee document according to the original report information and the reported information.
By adopting the technical scheme, the original title and the reported title in the referee document are extracted through the entity extraction model, then the prompt template is constructed through the original title and the reported title, and the key information in the referee document is extracted through the prompt template, so that the whole process does not need too much manual participation, does not need domain experts to write, does not need to update and modify rules even when a new data set or domain is processed, has strong applicability, and can be suitable for extracting key elements of all referee documents.
In a third aspect, the present application provides an electronic device, which adopts the following technical scheme: the system comprises a processor, a memory, a user interface and a network interface, wherein the memory is used for storing instructions, the user interface and the network interface are used for communicating with other devices, and the processor is used for executing the instructions stored in the memory so as to enable the electronic device to execute the computer program of any judge document element extraction method.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical solutions: a computer program capable of being loaded by a processor and executing any one of the judge document element extraction methods described above is stored.
In summary, the present application includes at least one of the following beneficial technical effects:
1. the whole process does not need excessive manual participation, does not need field expert writing, does not need updating and modifying rules even when processing a new data set or field, has strong applicability, and can be suitable for extracting key elements of all judge documents;
2. the entity extraction model is obtained by training a large amount of first training data input by the user side, so that the original title and the reported title in the judge document can be accurately extracted through the entity extraction model, and key elements in the judge document can be accurately extracted.
Drawings
FIG. 1 is a schematic flow chart of a method for extracting referee document elements according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for extracting referee document elements according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals illustrate: 1. calling a module; 2. a first generation module; 3. an extraction module; 4. a second generation module; 1000. an electronic device; 1001. a processor; 1002. a communication bus; 1003. a user interface; 1004. a network interface; 1005. a memory.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, not all embodiments.
In describing embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "illustrative," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "illustratively," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.
The enterprise risk information inquiry contains the intellectual property related risk, while the intellectual property risk information mainly comes from key information of the intellectual property referee document, such as original notice title, original notice name, original notice address, notice title, notice name and notice address, and key elements in the referee document can be extracted through the information extraction technology, so that a related report is generated, and the purpose of preventing risk is achieved.
Intellectual property referee documents have many elements, such as: case name, case number, judgment date, original notice title, original notice name, original notice address, notice title, notice name, notice address, and date of filing. Wherein the original title, the title of the notice may be represented in the referee document in a variety of ways, such as: "complaints", "review report", "original report", "complaint", "reported" and the like. At present, when six elements of original notice title, original notice name, original notice address, notice title, notice name and notice address are extracted, an extraction scheme is mainly adopted by rule extraction or relation extraction model, but rule extraction may have the problem of low extraction recall rate, and the relation extraction model needs to construct a complex model, needs a large amount of labeling data and needs to consume a large amount of resources. In order to efficiently and accurately extract the elements in the referee document, the patent provides a method for extracting the elements of the referee document.
According to the method, the prompt learning is used for fine adjustment of the pre-training model, so that the accuracy of element extraction is improved relative to rule extraction; compared with relation-based extraction, the method reduces the labeling sample size and reduces the complexity of an extraction model.
Fig. 1 is a flow chart of a method for extracting referee document elements according to an embodiment of the present application. It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows; the steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders; and at least some of the steps in fig. 1 may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least some of the other steps or sub-steps of other steps.
The application discloses a judge document element extraction method, which comprises S101-S104 as shown in figure 1.
S101, responding to an element extraction request of the referee document sent by a user side, and calling an entity extraction model to extract an original title and a reported title of the referee document.
In one example, for the element extraction request of the referee document sent by the user side, the extracted elements are generally related to the original notice and the reported notice, so that entity extraction models corresponding to the original notice and the reported notice can be set, of course, the extraction models are not limited to the extraction of the elements related to the original notice and the reported notice, and can be set according to the individual description, and the application is illustrated by the original notice and the reported notice for convenience of description.
When the user side sends an element extraction request of the referee document, the system calls an entity extraction model to extract original titles and reported titles in the referee document sent by the user side; in general, the element extraction of the referee document is usually performed by using a physical extraction model, and the physical extraction model is usually obtained by training a large amount of data.
For example, in order to extract the original title and the reported title in the referee document, since the original title and the reported title may have different names in different referee documents, the original title may have a name of "complaint person", "first-order complaint person", "original report" and "original report", and the like, and similarly, the reported title may have a plurality of names; thus, an entity extraction model is trained for accurately extracting the names of the original title and the reported title in the referee document. The training entity extraction model usually adopts marking of a plurality of referee documents, and the number of marked referee documents can be set by oneself; marking original titles and reported titles in a plurality of referee documents by a manual marking method, and training by using a bert+crf extraction model to obtain an entity extraction model. When the user side sends the element request of the judge document, the original title and the reported title in the judge document are extracted through the entity extraction model. Such as: the text data is "in the patent infringement disputes of the approval original notice XXX limited company, the complaint XXX limited company of the XX city, the XX limited company" and the original notice "are extracted as original notice titles and the notice" is the notice title through the entity extraction model ".
The bert+crf extraction model uses bert as a feature extractor, and the input text sequence is converted into semantic representation through a pre-trained bert model. Then, the output of the bert is taken as input, and the prediction and optimization of the tag sequence are performed in combination with the crf layer. The model is learned through training data such that the model can identify entities in the text and classify them into predefined entity categories. The bert+crf extraction model has better performance in NER tasks, can fully utilize the dependency relationship between the context information and the labels, and improves the accuracy and the robustness of entity identification.
S102, inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title.
In one example, when the original title and the reported title sent by the user side are extracted, the names of the original title and the reported title are input into a preset template, where the preset template may be set by the user, and after the original title and the reported title are input into the preset template, a prompt template corresponding to the current original title and the reported title is generated. For example: the names of the original notice titles may be "complaint", "first-level complaint", "original notice" and "original notice", etc., and the names of the notice titles may be "notice", "notice" etc.; the original notice title and the notice title may be different in different referee documents, so that a prompt template needs to be set according to the names of the original notice title and the notice title in the referee document, and the action of the prompt template can be understood as the function of the guiding system for extracting corresponding information from the referee document; for example, if the original notice title in the current referee document is "complaint" and the notice title is "notice" are extracted through the entity extraction model, the constructed prompting template may be divided into two boards, and one board is the original notice board, specifically "who is the complaint in the referee document? What is the address of the complaint's XXX company? If the original report title in the referee document is original report, the presentation template is "who is the original report in the referee document? What is the address of the original XXX company? "; the other board is a board to be informed, specifically, "who is the informed in the referee document? Is the location of the advertised XXX company address? If the title of the report in the referee document is the report, the report template is "who is the report in the referee document? Is the location of the interviewee's XXX company address? ".
S103, extracting original report information and reported information in the referee document according to the prompt content of the prompt template.
In one example, after the alert template is constructed, the system may extract the original and the reported information in the referee document according to the alert template. For example, after obtaining the prompt template, the system outputs a corresponding answer according to the prompt template, where the corresponding answer may be "complaint is XXX," address of XXX company of complaint is XXX "and" is reported as XXX, address of XXX company is XXX, etc. Generally, the content of the prompting template depends on the content finally output, the content wanted by the user can be obtained through the prompting template, the user can set the content of the template by himself, the corresponding content in the referee document is extracted, and redundant description is omitted.
S104, generating key elements of the judge document according to the original report information and the report information.
In one example, the original information generally includes, but is not limited to, an original title, an original name, and an original address; the notice information generally includes, but is not limited to, a notice title, a notice name, and a notice address; the original title, original name and original address, and the title, name and address are key elements of the judge document. Of course, different key elements may exist for different referee documents by different clients, and the key elements may be set according to personal needs.
Generally, the system extracts all information about original notices and notices in the judge document according to the prompting template, screens key information from all the information about the original notices and the notices, outputs the screened information through a corresponding open source model to obtain text information required by a user, and invokes an interface of the open source model to output a result after the prompting template extracts the key information in the judge document. The open source model is generally selected from models such as T5 and promptCLUE, GPT, a plurality of samples are generally extracted from the judge document at random, the original title and the notice title are obtained according to the steps, a prompt template is constructed based on the original title and the notice title, and the prompt template and the judge document text are spliced to be used as input of the open source model, so that the model effect in comparison with the open source model is better.
The open source models of T5, promptCLUE, GPT, etc. are explained below. T5 (Text-To-Text Transfer Transformer), simpleCLUE and GPT (generating Pre-trained Transformer) are all Pre-training models T5 is a transform-based Pre-training model for various natural language processing tasks. The natural language processing tasks are unified into the text-to-text conversion problem, and the multi-task learning is realized through large-scale unsupervised pre-training and supervised fine tuning. The promtcue is a general Chinese natural language understanding evaluation standard, and provides a series of data sets and evaluation indexes of Chinese natural language processing tasks, including tasks such as text classification, named entity recognition, relation extraction and the like. By using the promptCLUE, researchers and developers can make fair comparisons and evaluations of different models. GPT (generating Pre-trained Transformer) is a transducer-based Pre-trained model for generating text. The GPT model is pre-trained from a large-scale text corpus by unsupervised learning and can be used to automatically generate text, such as articles, conversations, etc. These open source models play an important role in the field of natural language processing.
Responding to an element extraction request of a referee document sent by a user side, and before an entity extraction model is called to extract original titles and reported titles of the referee document, the method further comprises the following steps: and acquiring first training data input by the user terminal, and constructing an entity extraction model according to the first training data.
In an example, the entity extraction model is usually obtained through a large amount of data training, and the training method can refer to the above embodiment, and the entity extraction model of the present application also has a learning function, and may have a case that the entity extraction model fails to extract the original title and/or the reported title in the current referee document, and at this time, the entity extraction model may store the original title and the reported title in the referee document, so that the entity extraction model can be directly invoked when encountering the same original title and the same reported title.
The first training data comprises data with labels, the first training data input by the user terminal is obtained, and an entity extraction model is constructed according to the first training data, and the method comprises the following steps: the method comprises the steps of obtaining marked data input by a user side, wherein the marked data comprise marked original titles and marked notice titles; invoking a preset extraction model, and extracting marked original titles and marked notice titles in the marked data to obtain the types of the original titles and the notice titles; and constructing the entity extraction model according to the original title and the type of the title to be advertised.
In an example, the entity extraction model is generally constructed by training a large amount of data, where the first training data is data for training the entity extraction model, and the method for constructing the entity model may refer to the above embodiment and will not be described in detail herein. It should be noted that, since the entity extraction model is generally used for extracting the original report title and the report title, when the entity extraction model is trained by using the first training data, only the original report title and the report title in the referee document are labeled, so as to obtain different kinds of names of the original report title and the report title in different referee documents, and the entity extraction model is constructed according to the different kinds of names of the original report title and the report title.
Acquiring first training data input by a user terminal, and constructing an entity extraction model according to the first training data, wherein the method further comprises the following steps: acquiring second training data input by a user side, and extracting the second training data through an entity extraction model to obtain a training result; transmitting a training result to a user side and receiving first feedback information transmitted by the user side; and adjusting the entity extraction model according to the first feedback information.
In one example, after the entity extraction model is obtained, whether the current entity extraction model meets the requirement or not is checked, and whether the current entity extraction model can be put into use or not is also required, at the moment, a preset number of referee documents are required to be selected, whether the entity extraction model can accurately extract original titles and reported titles in the referee documents is judged, and if the current judgment result meets the requirement, adjustment is not required, and the current entity extraction model can be put into use directly; if a great amount of errors exist in the current judgment result or the errors are higher than a preset value, training or adjustment is needed again to obtain a new entity extraction model. The judgment result is in accordance with the requirements, the extraction result can be sent to related staff, and the related staff judges the extraction result and judges whether the extraction result is in accordance with the requirements.
After responding to the element extraction request of the referee document sent by the user side, the method further comprises the following steps: acquiring a preset template; judging whether the judge document meets the preset requirements or not according to the preset template; if the requirement meets the preset requirement, calling a entity extraction model to extract the original title and the reported title of the judge document; if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling an entity extraction model to extract the original title and the notice title of the new judge document.
In one example, after the user side sends the element extraction request of the referee document, it is determined whether the referee document to be extracted at present meets the preset requirement, and since there may be slight differences between different referee documents, the referee document needs to be preprocessed before the element extraction of the referee document, and the text is cleaned by removing the stop word, illegal symbol, format, and the like. Here, by setting a content template, different user sides can set different content templates according to personal needs, after the referee document sent by the user is filled according to the content template, appropriate modification can be performed during filling, and if the current referee document does not meet the filling requirement or the content does not meet the preset template, appropriate modification can also be performed or error information can be sent to the user side, so that the user side can adjust. And finally, calling a entity extraction model to extract original titles and reported titles in the judge document.
Generating key elements of the referee document according to the original report information and the reported information, including: and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the judge document.
In an example, the processing method may refer to the above embodiment specifically, and the open source model may process the required information into text information, so as to facilitate the user side to view.
The key elements of the judge document comprise original notice key elements and notice key elements, and the key elements of the judge document are generated according to the original notice information and the notice information, and the key elements comprise: extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address; extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
In one example, key elements of the referee document include, but are not limited to, original key elements and notified key elements, and can be specifically set according to actual situations. After the user sets the required key information, the system automatically extracts three elements in the original report information, wherein the three elements are the key information, specifically comprise an original report name and an original report address, and the original report key elements are generated by combining the original report title, the original report name and the original report address; similarly, three elements in the notice information, namely a notice name and a notice address are extracted, and a notice key element is generated by combining the notice title, the notice name and the notice address; and combining the original key elements and the reported key elements to generate key elements of the judge document.
Based on the above method, the application also discloses a referee document element extracting device, as shown in fig. 2, and fig. 2 is a schematic structural diagram of a referee document element extracting device according to an embodiment of the application.
A referee document element extracting apparatus comprising: the system comprises a calling module 1, a first generating module 2, an extracting module 3 and a second generating module 4; the calling module 1 is used for responding to an element extraction request of the referee document sent by the user side, and calling the entity extraction model to extract an original title and a reported title of the referee document; the first generation module 2 is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title; the extraction module 3 is used for extracting original notice information and reported information in the judge document according to the prompt content of the prompt template; the second generation module 4 generates key elements of the referee document according to the original report information and the report information.
It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.
Referring to fig. 3, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 3, the electronic device 1000 may include: at least one processor 1001, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002.
Wherein the communication bus 1002 is used to enable connected communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 1001 may include one or more processing cores. The processor 1001 connects various parts within the entire server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005, and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1001 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1001 and may be implemented by a single chip.
The Memory 1005 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory 1005 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 3, an operating system, a network communication module, a user interface module, and an application program of a referee document element extraction method may be included in a memory 1005 as a computer storage medium.
In the electronic device 1000 shown in fig. 3, the user interface 1003 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 1001 may be configured to invoke an application program in memory 1005 that stores a referee document element extraction method that, when executed by one or more processors, causes the electronic device to perform the method as described in one or more of the above embodiments.
An electronic device readable storage medium storing instructions. When executed by one or more processors, cause an electronic device to perform the method as described in one or more of the embodiments above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.
Claims (10)
1. The judge document element extraction method is characterized by being applied to a server, and comprises the following steps:
responding to an element extraction request of a referee document sent by a user side, and calling an entity extraction model to extract original titles and reported titles of the referee document;
inputting the original title and the reported title into a preset template, and generating a prompt template corresponding to the original title and the reported title;
extracting original notice information and notice information in the referee document according to the prompt content of the prompt template;
and generating key elements of the judge document according to the original report information and the reported information.
2. The method for extracting referee document elements according to claim 1, wherein before the entity extraction model is called to extract original titles and reported titles of the referee document in response to the request for extracting elements of referee document sent by the user terminal, the method further comprises:
and acquiring first training data input by the user terminal, and constructing the entity extraction model according to the first training data.
3. The method for extracting referee document elements according to claim 2, wherein the first training data includes labeled data, the obtaining the first training data input by the user terminal, and constructing the entity extraction model according to the first training data includes:
acquiring the marked data input by the user side, wherein the marked data comprises marked original titles and marked notice titles;
invoking a preset extraction model, and extracting the noted original title and the noted title in the noted-attached data to obtain the types of the original title and the noted title;
and constructing the entity extraction model according to the types of the original title and the reported title.
4. The method for extracting referee document elements according to claim 2, wherein the step of obtaining the first training data input by the user terminal, and constructing the entity extraction model according to the first training data, further comprises:
acquiring second training data input by the user side, and extracting the second training data through the entity extraction model to obtain a training result;
the training result is sent to the user side, and first feedback information sent by the user side is received;
and adjusting the entity extraction model according to the first feedback information.
5. The referee document element extraction method according to claim 1, further comprising, after the response to the element extraction request of the referee document sent by the user side:
acquiring a preset template;
judging whether the judge document meets preset requirements or not according to the preset template;
if the requirement is met, calling the entity extraction model to extract the original title and the reported title of the judge document;
if the content of the judge document does not meet the preset requirement, deleting or supplementing the content of the judge document to obtain a new judge document, and calling the entity extraction model to extract the original title and the reported title of the new judge document.
6. The method of claim 1, wherein the generating key elements of the referee document based on the original report information and the reported information comprises:
and calling an open source model to process the original report information and the reported information to generate text information, wherein the text information is a key element of the referee document.
7. The method for extracting key elements of a referee document according to claim 1, wherein the key elements of the referee document include original key elements and reported key elements, and the generating key elements of the referee document based on the original information and the reported information includes:
extracting an original notice name and an original notice address in the original notice information, and generating original notice key elements by combining the original notice title, the original notice name and the original notice address;
extracting a notice name and a notice address in the notice information, and generating a notice key element by combining the notice title, the notice name and the notice address;
and combining the original key elements and the reported key elements to generate key elements of the judge document.
8. A referee document element extracting apparatus, comprising: the device comprises a calling module (1), a first generating module (2), an extracting module (3) and a second generating module (4); wherein,
the calling module (1) is used for responding to an element extraction request of the referee document sent by the user side, and calling the entity extraction model to extract the original title and the reported title of the referee document;
the first generation module (2) is used for inputting the original title and the reported title into a preset template and generating a prompt template corresponding to the original title and the reported title;
the extraction module (3) is used for extracting original notice information and notice information in the referee document according to the prompt content of the prompt template;
the second generation module (4) generates key elements of the referee document according to the original report information and the reported information.
9. An electronic device comprising a processor (1001), a memory (1005), a user interface (1003) and a network interface (1004), the memory (1005) being configured to store instructions, the user interface (1003) and the network interface (1004) being configured to communicate to other devices, the processor (1001) being configured to execute the instructions stored in the memory to cause the electronic device to perform the method of any of claims 1-7.
10. A computer readable storage medium, characterized in that a computer program is stored which can be loaded by a processor and which performs the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311006774.0A CN117033539A (en) | 2023-08-10 | 2023-08-10 | Judge document element extraction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311006774.0A CN117033539A (en) | 2023-08-10 | 2023-08-10 | Judge document element extraction method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117033539A true CN117033539A (en) | 2023-11-10 |
Family
ID=88601766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311006774.0A Pending CN117033539A (en) | 2023-08-10 | 2023-08-10 | Judge document element extraction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117033539A (en) |
-
2023
- 2023-08-10 CN CN202311006774.0A patent/CN117033539A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230206087A1 (en) | Techniques for building a knowledge graph in limited knowledge domains | |
US8522195B2 (en) | Systems and methods to generate a software framework based on semantic modeling and business rules | |
US20110153292A1 (en) | Framework to populate and maintain a service oriented architecture industry model repository | |
US20220100963A1 (en) | Event extraction from documents with co-reference | |
CN110598070B (en) | Application type identification method and device, server and storage medium | |
US20220100772A1 (en) | Context-sensitive linking of entities to private databases | |
CN116127020A (en) | Method for training generated large language model and searching method based on model | |
US20220284171A1 (en) | Hierarchical structure learning with context attention from multi-turn natural language conversations | |
CN116680093A (en) | LLM-based web application optimization system and service implementation method and system | |
US20220253288A1 (en) | Natural solution language | |
US11645452B2 (en) | Performance characteristics of cartridge artifacts over text pattern constructs | |
CN105335466A (en) | Audio data retrieval method and apparatus | |
Hsu | Extending UML to model Web 2.0‐based context‐aware applications | |
US20220100967A1 (en) | Lifecycle management for customized natural language processing | |
WO2022072237A1 (en) | Lifecycle management for customized natural language processing | |
El Beggar et al. | CIM for data warehouse requirements using an UML profile | |
CN116701604A (en) | Question and answer corpus construction method and device, question and answer method, equipment and medium | |
CN116702746A (en) | Cross-platform multi-theme irony and cause identification method, device, equipment and medium | |
CN116978028A (en) | Video processing method, device, electronic equipment and storage medium | |
CN117033539A (en) | Judge document element extraction method, device, equipment and storage medium | |
KR20230059364A (en) | Public opinion poll system using language model and method thereof | |
CN113239670A (en) | Method and device for uploading service template, computer equipment and storage medium | |
CN113761931A (en) | Information processing method, device, electronic equipment and storage medium | |
Hoi et al. | Manipulating Data Lakes Intelligently with Java Annotations | |
CN117075900B (en) | Method and system for generating h5 page content based on AI |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |