CN111782790A - Document analysis method and device, electronic equipment and storage medium - Google Patents

Document analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111782790A
CN111782790A CN202010639431.8A CN202010639431A CN111782790A CN 111782790 A CN111782790 A CN 111782790A CN 202010639431 A CN202010639431 A CN 202010639431A CN 111782790 A CN111782790 A CN 111782790A
Authority
CN
China
Prior art keywords
document
answer
question
preliminary
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010639431.8A
Other languages
Chinese (zh)
Inventor
王福钋
杜新凯
史辉
蔡岩松
高峰
韩佳
刘谦
史祎凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sunshine Insurance Group Co Ltd
Original Assignee
Sunshine Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sunshine Insurance Group Co Ltd filed Critical Sunshine Insurance Group Co Ltd
Priority to CN202010639431.8A priority Critical patent/CN111782790A/en
Publication of CN111782790A publication Critical patent/CN111782790A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a document analysis method and device, electronic equipment and a storage medium. The method comprises the following steps: obtaining a question and a document for answering the question; analyzing the question and the document by using a preset document analysis model, and determining a primary answer of the question from the document; and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question. After the preliminary answer of the question is determined through the document analysis model, the preliminary answer is further processed through the answer processing model, so that a more accurate answer can be obtained, the accuracy of the OA question and answer of the machine is improved, and the applicability of the OA question and answer in practice is further improved.

Description

Document analysis method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for analyzing a document, an electronic device, and a storage medium.
Background
As machine learning has advanced, it has become possible to implement QA question answering using machine learning. For example, the model processes the question and the document corresponding to the question using a self-attention mechanism, and then the answer corresponding to the question can be determined from the document.
However, the application of this technique is also rough, and the accuracy of the determined answer is not enough, so that the practical application is limited.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for analyzing a document, so as to improve the accuracy of OA questions and answers of a machine, so as to improve the applicability of the OA questions and answers in practice.
In a first aspect, an embodiment of the present application provides a method for analyzing a document, where the method includes: obtaining a question and a document for answering the question; analyzing the question and the document by using a preset document analysis model, and determining a primary answer of the question from the document; and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
In the embodiment of the application, after the preliminary answer of the question is determined through the document analysis model, the preliminary answer is further processed by using the answer processing model, so that a more accurate answer can be obtained, the accuracy of the OA question and answer of the machine is improved, and the applicability of the OA question and answer in practice is further improved.
With reference to the first aspect, in a first possible implementation manner, the processing, by using a preset answer processing model, the processing of the preliminary answer to obtain a final answer to the question includes: fusing a plurality of preliminary answers by using the answer processing model to obtain the final answer; or screening the plurality of preliminary answers based on the question by using the answer processing model to select the final answer from the plurality of preliminary answers.
In the embodiment of the present application, the further processing manner is different according to different actual application scenarios, for example, fusion or screening may be selected, so that the applicability of the scheme in practice is further improved.
With reference to the first aspect, in a second possible implementation manner, processing the preliminary answer by using a preset answer processing model to obtain a final answer to the question includes: and correcting part of contents in the preliminary answer based on the question by using the answer processing model to obtain the final answer.
In the embodiment of the application, by directly correcting part of contents in the preliminary answer, the inaccurate contents can be directly adjusted to be accurate, so that the answer can be directly and quickly corrected.
With reference to the first aspect, in a third possible implementation manner, the method applied to a server, for obtaining a question and a document used for answering the question, includes: the server acquires the question and the number of the document sent by the web end; and the server queries a preset database by using the number to obtain the document.
In the embodiment of the application, because the problem and the number of the document are obtained, the data volume is small, and the communication bandwidth between the server and the web end can be saved.
With reference to the first aspect, in a fourth possible implementation manner, the document analysis model includes multiple sub models of different types, the problem and the document are analyzed by using a preset document analysis model, and a preliminary answer to the problem is determined from the document, where the method includes: and processing the question and the document by utilizing each sub-model to obtain a preliminary answer of the question output by each sub-model.
In the embodiment of the present application, since there are a plurality of submodels, a plurality of preliminary answers can be obtained by processing a plurality of submodels of different types. And more accurate final answers can be obtained by further processing the plurality of preliminary answers.
With reference to the first aspect, in a fifth possible implementation manner, analyzing the question and the document by using a preset document analysis model, and determining a preliminary answer to the question from the document includes: segmenting the document into a plurality of portions; packaging each part together with the question to obtain a piece of packaged data; and analyzing the plurality of pieces of packed data by using the document analysis model to obtain the preliminary answer.
In the embodiment of the application, each part is packaged together with the question, so that each packaged data processed by the document analysis model contains the question, and a more accurate answer can be output after the document analysis model is processed.
With reference to the first aspect or any one of the foregoing possible implementation manners, in a sixth possible implementation manner, after obtaining the final answer, the method further includes: and training and optimizing the document analysis model by using the question, the document and the final answer.
In the embodiment of the application, the document analysis model is directly optimized and analyzed after the final answer is obtained, so that the synchronization of actual use and training is realized, and the efficiency of model training is improved.
In a second aspect, an embodiment of the present application provides an apparatus for analyzing a document, where the apparatus includes: an acquisition module for acquiring a question and a document for answering the question; the processing module is used for analyzing the question and the document by using a preset document analysis model and determining a primary answer of the question from the document; and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
With reference to the second aspect, in a first possible implementation manner, the number of the preliminary answers is multiple, and the processing module is configured to fuse the multiple preliminary answers by using the answer processing model to obtain the final answer; or screening the plurality of preliminary answers based on the question by using the answer processing model to select the final answer from the plurality of preliminary answers.
With reference to the second aspect, in a second possible implementation manner, the processing module is configured to modify, by using the answer processing model, a part of content in the preliminary answer based on the question, so as to obtain the final answer.
With reference to the second aspect, in a third possible implementation manner, the apparatus is applied to a server, and the processing module is configured to obtain, by the server, the question and the number of the document that are sent by a web end; and the server queries a preset database by using the number to obtain the document.
With reference to the second aspect, in a fourth possible implementation manner, the document analysis model includes multiple sub models with different types, and the processing module is configured to process the question and the document by using each sub model to obtain a preliminary answer to the question output by each sub model.
With reference to the second aspect, in a fifth possible implementation manner, the processing module is configured to divide the document into a plurality of portions; packaging each part together with the question to obtain a piece of packaged data; and analyzing the plurality of pieces of packed data by using the document analysis model to obtain the preliminary answer.
With reference to the second aspect or any one of the foregoing possible implementation manners, in a sixth possible implementation manner, after the final answer is obtained, the processing module is further configured to train and optimize the document analysis model by using the question, the document, and the final answer.
In a third aspect, an embodiment of the present application provides an electronic device, including: a bus; a memory for storing a program; a processor, connected to the memory through the bus, for calling the program to execute the document analysis method according to the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing program code, which, when executed by a computer, performs a method for analyzing a document according to the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a flowchart of a document analysis method provided in an embodiment of the present application;
fig. 2 is a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a document analysis apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Referring to fig. 1, an embodiment of the present application provides a document analysis method, where the document analysis method may be executed by an electronic device, and the electronic device may be a terminal or a server, and a flow of the document analysis method may include:
step S100: a question and a document for answering the question are obtained.
Step S200: and analyzing the question and the document by using a preset document analysis model, and determining a primary answer of the question from the document.
Step S300: and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
The above flow will be described in detail with reference to the application scenario.
Step S100: a question and a document for answering the question are obtained.
In this embodiment, the question may be an open-type question, for example, the question may be: finding XXX content in the XX article, for example, the question may be: what the XXX amount is in XX file, and for another example the question may be: which are the XX subjects in the XXX file. In addition, the type of document may vary according to the actual application scenario. For example, if the application scenario is an analysis of a contract, then the document may be a contract document; for another example, the application scenario is an analysis of a legal document, and then the document may be a legal document; as another example, where the application scenario is an analysis of regulations, the document may be a regulation file.
In this embodiment, the manner of acquiring the question and the document by the electronic device is different according to different actual application scenarios. For example, the user may send a question to the electronic device, and the electronic device may retrieve a document from the database for answering the question. For another example, the user may send the question and a document for answering the question to the electronic device together, so that the electronic device may obtain the question and the document together.
Specifically, if the user needs to send the question to the electronic device first, the user may send the question to the electronic device through the web. After the electronic device receives the question, the electronic device may perform keyword extraction on the question to extract keywords describing the question. The electronic equipment takes the keywords, the pre-recorded historical click times of the user on the documents, the document entry time and other characteristics as the input of a preset search engine, so that the search engine is utilized to search the content related to the keywords in a preset database, and the documents related to the keywords, namely the documents related to the problems, are searched.
It is understood that, since the related documents may have a plurality of documents, to narrow the analysis scope, the electronic device may push the related documents to the web end, so that the user selects a document for answering the question from the plurality of documents, and then directly analyzes the document selected by the user. If bandwidth consumption is not considered, the electronic equipment can push a plurality of documents to the web end in a whole manner; if bandwidth consumption is considered, the electronic device may only push the titles of the multiple documents to the web side.
Of course, when the user determines the document for answering the question, the web end may return the entire determined document to the electronic device if the consumption of the bandwidth is not considered, and the web end may return the number of the determined document to the electronic device if the consumption of the bandwidth is considered, and the electronic device queries the database by using the number, so as to obtain the complete document corresponding to the number, and use the question and the document as subsequent processing.
For example, assume 1:
the problem of the user sending through the web end is as follows: what the time to take effect in user a's insurance contract. After the electronic device obtains the question, the electronic device extracts keywords from the question, and the obtained keywords may include: user a, insurance contract, effective time. The electronic device searches the database by using the keywords, and the query of the files related to the keywords comprises the following steps: insurance contract a1 for user a, collaboration agreement a2 for user a, insurance contract B1 for user B, insurance contract C1 for user C. The electronic equipment pushes the names of the insurance contract A1, the cooperation agreement A2, the insurance contract B1 and the insurance contract C1 to the web end, and the insurance contract A1 is selected by the user. Thus, the electronic device can determine that insurance contract A1 is a document to answer the question and obtain the complete file of insurance contract A1 from the database.
Assume 2:
the problems sent by the user through the web end are as follows: what is in the AA file about the 33 french terms. After the electronic device obtains the question, the electronic device extracts keywords from the question, and the obtained keywords may include: AA document, 33 french. The electronic device searches the database by using the keywords, and the query of the files related to the keywords comprises the following steps: AA file, AAA file. And the electronic equipment pushes the names of the AA file and the AAA file to a web end, and the user selects the AA file. Thus, the electronic device can determine that the AA file is a document for answering the question and obtain a complete file of the AA file from the database.
In this embodiment, if the user directly sends the question and the document for answering the question to the electronic device through the web, the electronic device may directly use the question and the document for subsequent processing. Of course, if the document sent by the user is also the number of the document, the electronic device also needs to query the database by using the number to obtain the complete document corresponding to the number, and then uses the problem and the document as subsequent processing.
After the question and the document are obtained, the electronic device may execute step S200.
Step S200: and analyzing the question and the document by using a preset document analysis model, and determining a primary answer of the question from the document.
In this embodiment, when the electronic device acquires the problem and the document, the format of the problem and the document generally cannot meet the processing requirement of the document analysis model, so that the problem and the document are preprocessed, so that the processed problem and the document meet the processing requirement of the document analysis model.
As an exemplary manner of preprocessing, the electronic device may first divide the document into a plurality of parts in units of segments, where each part may contain one piece of content or a plurality of pieces of content. And packaging each part together with the question by the electronic equipment according to the format requirement of the document analysis model so as to obtain a corresponding piece of packaged data, and obtaining a plurality of pieces of packaged data in total, wherein the format of each piece of packaged data meets the format requirement of the document analysis model.
For example, the format of the packed data may be as follows:
Figure BDA0002568411900000081
wherein, the data represents a complete data file, and the data file can contain a plurality of pieces of packed data; version represents the format requirement Version of the document analysis model; title represents a document title; paragrams represents the paragraph number of this part of the package in the document; context represents the document content of the portion packaged; qas denotes a list of answers, and one question corresponds to one answer, which is the actual list of answers at the time of training, and is empty at the time of actual prediction; answer _ start represents the starting position of the answer in the document, which is the actual starting position at the time of training and is empty at the time of actual prediction; text represents the content of the answer, which is the actual content at the time of training and is empty at the time of actual prediction; question represents the content of a Question; id then represents the number of the problem.
It is understood that the manner of packing in units of parts in the present embodiment is only an exemplary manner of the present embodiment, and is not limited to the present embodiment. For example, it may be possible to directly package the entire document and the question into one piece of data without segmenting the document.
Further, after obtaining the plurality of pieces of packaged data, the electronic device may analyze the plurality of pieces of packaged data using the document analysis model.
In this embodiment, the document analysis model may be a complete model, or it may include multiple sub-models of different types, which are described below.
If the document analysis model is a complete model, the document analysis model may adopt a self-attention mechanism model, such as an open-source ERNIE1.0 model, wherein the batch size of the ERNIE1.0 model may be set to 48 (i.e., a batch processing 48 packed data at most), the earning _ range may be set to 5.00E-06, and the Epoch may be set to 20, so as to quickly achieve convergence when training the model. The document analysis model may be packaged as a Docker container for deployment into an electronic device. Therefore, the electronic equipment loads the document analysis model into the memory for instantiation at one time, and the document analysis model can be called conveniently.
The electronic equipment inputs a plurality of pieces of packed data into the document analysis model for processing, the document analysis model can determine the position of the primary answer in the document, and the document analysis model actively extracts the primary answer from the document according to the position so as to output the primary answer. In this way, the electronic device can obtain the preliminary answer. Of course, the way that the document analysis model directly outputs the preliminary answer as the result is only an exemplary way of this embodiment, and is not limited, for example, the electronic device inputs all the plurality of pieces of packaged data into the document analysis model for processing, the document analysis model may directly output the position of the preliminary answer in the document, and the electronic device extracts the preliminary answer from the document according to the position, so as to obtain the preliminary answer.
It will be appreciated that the principle of the document analysis model is to calculate the relevance of each of the contents at each location in the document to the problem. In this way, the document analysis model can be controlled to output one or more results by setting the output conditions of the document analysis model. For example, if the output condition of the document analysis model is set to output the result with the highest relevance, the document analysis model may output the preliminary answer with the highest relevance. For another example, if the output condition of the document analysis model is set to output 10 results from high to low in the relevance, the document analysis model may output 10 preliminary answers.
Continuing with the description of hypothesis 1 above: the electronic device divides and packages "how much effective time is in the insurance contract of the user a" and "insurance contract a 1" into three pieces of packaged data, and then inputs all the three pieces of packaged data into the document analysis model for output, and then the result output by the document analysis model may be: "should belong to the first category of cases, time of validity of insurance: 1 month and 1 day of 2020.
If the document analysis model includes sub-models of different types, for example, the document analysis model includes 2 sub-models, the first sub-model is the ERNIE1.0 model and the second sub-model is the Bert-Base model, where the Bert-Base model has a Batchsize set to 32, an earning _ range set to 5.00E-05, and an Epoch set to 4, so as to quickly achieve convergence when training the model. Each submodel can be packaged into a Docker container to be deployed in the electronic device, and provides the Resful microservice for the outside. Therefore, the electronic equipment loads each sub-model into the memory once to be instantiated, and each sub-model can be called conveniently.
In this embodiment, the electronic device may input the plurality of pieces of packed data into each submodel for processing, each submodel may determine a position of the preliminary answer in the document, and each submodel actively extracts the preliminary answer from the document according to the position, thereby outputting the preliminary answer. Thus, the electronic device can obtain a plurality of preliminary answers. Of course, the manner in which each submodel directly outputs the preliminary answer as a result is only an exemplary manner of this embodiment, and is not limited to this, for example, the electronic device inputs a plurality of pieces of packed data into each submodel for processing, each submodel may directly output the position of the preliminary answer in the document, and the electronic device extracts the preliminary answer from the document according to the position, so as to obtain the preliminary answer.
Continuing with the description of hypothesis 1 above: the electronic device divides and packages the "effective time in the insurance contract of the user a" and the "insurance contract a 1" into three pieces of packaged data, and then inputs the three pieces of packaged data into the submodel 1, the submodel 2 and the submodel 3 respectively for processing, so that the result obtained by the output of the submodel 1 may be: "insurance effective time: 1 month and 1 day in 2020 "; the result output by submodel 2 may be: "insurance expiration time: 31/12/2020 "; the result output by submodel 3 may be: "time: 1 month and 1 day of 2020.
Continuing with the description of hypothesis 2 above: the electronic device divides and packs the 'AA file' into three pieces of packed data, and then inputs the three pieces of packed data into the submodel 1, the submodel 2, and the submodel 3, respectively, so as to obtain the result output by the submodel 1, which may be: "applicant can make modifications to its patent application documents, but the AA and BBB patent application documents must not be modified beyond the scope of the original XX and XXXX documents, in terms of appearance"; the result output by submodel 2 may be: "AA and BBB patent application documents should not be modified beyond the original XX and XXXX descriptions, and appearance design patent application documents should not be modified beyond the original pictures or photo representations"; the result output by submodel 3 may be: "the applicant can modify its patent application document, the AA and BBB patent application documents should not be modified beyond the original XX range, and the appearance design patent application document should not be modified beyond the original picture or photo representation range".
In this embodiment, after the electronic device obtains the preliminary answer, the electronic device may execute step S300 to further process the preliminary answer.
Step S300: and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
In this embodiment, the types of answer processing models are different and the processing modes of the answer processing models for the preliminary answers are different due to different application scenarios or different numbers of the preliminary answers. In general, what types of answer processing models are used, and the processing manner of the preliminary answer may include: fusion, screening and adjustment, as described separately below.
1. Regarding the fusion process.
On the basis that the number of the preliminary answers is multiple, but each preliminary answer may not be very comprehensive, the multiple preliminary answers may be subjected to a fusion process by using an answer processing model.
Specifically, the electronic device may input a plurality of preliminary answers into the answer processing model, and the answer processing model may fuse the plurality of preliminary answers in a manner of taking a union set of the plurality of preliminary answers, so as to obtain a final answer after the fusion.
Continuing with the description of hypothesis 2 above:
the electronic device compares the preliminary answer 1: "applicant can make modifications to its patent application documents, but the AA and BBB patent application documents must not be modified beyond the scope of the original XX and XXXX documents, in terms of appearance"; preliminary answer 2: "AA and BBB patent application documents should not be modified beyond the original XX and XXXX descriptions, and appearance design patent application documents should not be modified beyond the original pictures or photo representations"; and preliminary answer 3: the applicant can modify the patent application document, the AA and BBB patent application documents should not be modified beyond the original XX range, and the appearance design patent application document should not be modified beyond the range represented by the original picture or photo "input into the answer processing model to be fused, so that the electronic device can obtain the final answer: "the applicant can modify its patent application files, but the AA and BBB patent application files must not be modified beyond the original XX and XXXX documentation, nor must the appearance design patent application files be modified beyond the original pictorial or photographic representations".
It is understood that, regarding the fusion of answers, besides taking the union set, it can also take the intersection set of the preliminary answers according to the requirement of the practical application.
2. Regarding the screening process.
On the basis that the number of the preliminary answers is multiple and part of the preliminary answers are not very accurate, the multiple preliminary answers can be screened by using the answer processing model.
Specifically, the electronic device may input the plurality of preliminary answers and the question into an answer processing model, and the answer processing model filters the plurality of preliminary answers based on the question, so as to obtain a final answer after fusion.
Continuing with the description of hypothesis 1 above:
the electronic device compares "what the effective time is in the insurance contract of the user a", the preliminary answer 1 "the insurance effective time: 1/2020, "preliminary answer 2" insurance expiration time: 12/31/2020 ", and preliminary answer 3" time: year 2020, 1 month, 1 day "is input to the answer processing model. The answer processing model filters three preliminary answers based on the question, and then the "insurance validity time: year 2020, 1 month, 1 day "is the final answer.
3. Regarding the adjustment process.
On the basis that one or more preliminary answers are available and part of the contents in the preliminary answers are not very accurate or redundant, the multiple preliminary answers can be adjusted by using the answer processing model.
Specifically, the electronic device may input the preliminary answer and the question into an answer processing model, and the answer processing model modifies a part of the content in the preliminary answer based on the question pair, such as modifies or deletes the part of the content, so as to obtain a final answer.
It will be appreciated that the conditioning process may also be combined with the screening and fusing processes described above, i.e., conditioning followed by screening or fusing.
Continuing with the description of hypothesis 1 above:
the electronic device compares the "what the effective time in the insurance contract of the user a is" and the preliminary answer is "should belong to the first case, the effective time of the insurance: 1/2020 "is input into the answer processing model for processing, and then the answer processing model may delete" should belong to the first category "in the preliminary answer based on the question, so as to obtain" effective time of insurance: 1/2020 ".
In this embodiment, after the final answer is determined, the electronic device may push the final answer to the web end so as to be referred by the user.
Of course, if the user does not modify the final answer after consulting, the electronic device may train an optimized document analysis model using the question, the document, and the final answer. The method comprises the steps of converting a question, a document and a final answer into packed data according to the mode, inputting the packed data into a document analysis model for processing so as to train the model, wherein the final answer is written into a corresponding answer column in the packed data.
And if the final answer is modified after the user consults, the web end can feed back the modified final answer to the electronic equipment. On one hand, the electronic equipment can train an optimized document analysis model by using the questions, the documents and the modified final answers; on the other hand, the electronic device may train an optimized answer processing model using the question, the preliminary answer, and the modified final answer.
Of course, the processing method of the preliminary answer is not limited to the above-mentioned exemplary method, and other methods may be adopted according to different practical application scenarios. For example, the answer processing model may perform similar duplication removal on a plurality of preliminary answers output by each submodel, and then screen the remaining preliminary answers after duplication removal in a manner of taking an intersection, thereby obtaining and outputting a final answer. For example: for one problem, both the Ernie and bert models output 3 results, E1, E2 and E3, and B1, B2 and B3, respectively. Calculating the answer processing model by a similarity method, for example, two modes of Rouge-L and semantic similarity, and performing duplication removal after calculation by the Rouge-L to obtain results E1, B2 and B3; and after semantic similarity calculation, de-duplication is carried out to obtain E1, E2 and B2. And finally, merging the screening results to obtain the final results of E1, E2, B2 and B3.
In this embodiment, the electronic device may further deploy a summary generation model, and the summary generation model may be a transform model. The electronic equipment can input the document into the abstract generating model to process while obtaining the question and the document, so that the abstract generating model can generate an abstract describing the document based on the document, and then the abstract and the final answer are pushed to the web side.
Of course, after the user consults, if the abstract is modified, the web end can feed back the modified abstract to the electronic device. In one aspect, the electronic device may train an optimized summary generation model using the modified summary and the document.
Referring to fig. 2, based on the same AA concept, the present embodiment provides an electronic device 10, and the electronic device 10 may include a communication interface 11 connected to a network, one or more processors 12 for executing program instructions, a bus 13, and a memory 14 of different forms, such as a disk, a ROM, or a RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof.
The memory 14 is used for storing programs, and the processor 12 is used for calling and running the programs in the memory 14 to execute the aforementioned document analysis method.
Referring to fig. 3, based on the same AA concept, an embodiment of the present application provides an apparatus 100 for analyzing a document, where the apparatus 100 for analyzing a document is applied to an electronic device, and the apparatus 100 for analyzing a document includes:
an obtaining module 110 for obtaining a question and a document for answering the question.
The processing module 120 is configured to analyze the question and the document by using a preset document analysis model, and determine a preliminary answer to the question from the document; and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
It should be noted that, as those skilled in the art can clearly understand, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Some embodiments of the present application further provide a computer-readable storage medium of a computer-executable nonvolatile program code, which can be a general-purpose storage medium such as a removable disk, a hard disk, or the like, and the computer-readable storage medium stores a program code thereon, which when executed by a computer, performs the steps of the document analysis method of any of the above embodiments.
The program code product of the document analysis method provided in the embodiment of the present application includes a computer-readable storage medium storing the program code, and instructions included in the program code may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
In conclusion, after the preliminary answer of the question is determined through the document analysis model, the preliminary answer is further processed through the answer processing model, so that a more accurate answer can be obtained, the accuracy of the OA question and answer of the machine is improved, and the applicability of the OA question and answer in practice is further improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method of analyzing a document, the method comprising:
obtaining a question and a document for answering the question;
analyzing the question and the document by using a preset document analysis model, and determining a primary answer of the question from the document;
and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
2. The method for analyzing document according to claim 1, wherein the number of the preliminary answers is plural, and the processing of the preliminary answers using a preset answer processing model to obtain the final answer to the question comprises:
fusing a plurality of preliminary answers by using the answer processing model to obtain the final answer; alternatively, the first and second electrodes may be,
and screening the plurality of preliminary answers based on the question by using the answer processing model so as to select the final answer from the plurality of preliminary answers.
3. The method for analyzing document according to claim 1, wherein the processing the preliminary answer using a preset answer processing model to obtain a final answer to the question comprises:
and correcting part of contents in the preliminary answer based on the question by using the answer processing model to obtain the final answer.
4. The method for analyzing a document according to claim 1, wherein the method is applied to a server, obtains questions and a document for answering the questions, and comprises the following steps:
the server acquires the question and the number of the document sent by the web end;
and the server queries a preset database by using the number to obtain the document.
5. The method for analyzing a document according to claim 1, wherein the document analysis model includes a plurality of sub-models with different types, the question and the document are analyzed by using a preset document analysis model, and a preliminary answer to the question is determined from the document, including:
and processing the question and the document by utilizing each sub-model to obtain a preliminary answer of the question output by each sub-model.
6. The method for analyzing document according to claim 1, wherein analyzing the question and the document by using a preset document analysis model, and determining a preliminary answer to the question from the document comprises:
segmenting the document into a plurality of portions;
packaging each part together with the question to obtain a piece of packaged data;
and analyzing the plurality of pieces of packed data by using the document analysis model to obtain the preliminary answer.
7. The method of analyzing a document according to any one of claims 1-6, wherein after obtaining the final answer, the method further comprises:
and training and optimizing the document analysis model by using the question, the document and the final answer.
8. An apparatus for analyzing a document, the apparatus comprising:
an acquisition module for acquiring a question and a document for answering the question;
the processing module is used for analyzing the question and the document by using a preset document analysis model and determining a primary answer of the question from the document; and processing the preliminary answer by using a preset answer processing model to obtain a final answer of the question.
9. An electronic device, comprising:
a bus;
a memory for storing a program;
a processor connected to the memory through the bus for calling the program to perform the document analysis method of any one of claims 1-7.
10. A non-transitory computer-readable storage medium storing program code for performing a method of analyzing a document according to any one of claims 1 to 7 when the program code is executed by a computer.
CN202010639431.8A 2020-07-03 2020-07-03 Document analysis method and device, electronic equipment and storage medium Pending CN111782790A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010639431.8A CN111782790A (en) 2020-07-03 2020-07-03 Document analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010639431.8A CN111782790A (en) 2020-07-03 2020-07-03 Document analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111782790A true CN111782790A (en) 2020-10-16

Family

ID=72758902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010639431.8A Pending CN111782790A (en) 2020-07-03 2020-07-03 Document analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111782790A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125734A1 (en) * 2009-11-23 2011-05-26 International Business Machines Corporation Questions and answers generation
WO2019000240A1 (en) * 2017-06-27 2019-01-03 华为技术有限公司 Question answering system and question answering method
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal
CN110765254A (en) * 2019-10-21 2020-02-07 北京理工大学 Multi-document question-answering system model integrating multi-view answer reordering
CN110955761A (en) * 2019-10-12 2020-04-03 深圳壹账通智能科技有限公司 Method and device for acquiring question and answer data in document, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110125734A1 (en) * 2009-11-23 2011-05-26 International Business Machines Corporation Questions and answers generation
WO2019000240A1 (en) * 2017-06-27 2019-01-03 华为技术有限公司 Question answering system and question answering method
CN109766423A (en) * 2018-12-29 2019-05-17 上海智臻智能网络科技股份有限公司 Answering method and device neural network based, storage medium, terminal
CN110955761A (en) * 2019-10-12 2020-04-03 深圳壹账通智能科技有限公司 Method and device for acquiring question and answer data in document, computer equipment and storage medium
CN110765254A (en) * 2019-10-21 2020-02-07 北京理工大学 Multi-document question-answering system model integrating multi-view answer reordering

Similar Documents

Publication Publication Date Title
CN110020424B (en) Contract information extraction method and device and text information extraction method
CA3174601C (en) Text intent identifying method, device, computer equipment and storage medium
CA3033859C (en) Method and system for automatically extracting relevant tax terms from forms and instructions
US9965460B1 (en) Keyword extraction for relationship maps
WO2018013702A1 (en) System and method for automatically understanding lines of compliance forms through natural language patterns
KR101933953B1 (en) Software domain topics extraction system using PageRank and topic modeling
CN110765235A (en) Training data generation method and device, terminal and readable medium
CN110210038B (en) Core entity determining method, system, server and computer readable medium thereof
CN109542956A (en) Report form generation method, device, computer equipment and storage medium
CN109815112B (en) Data debugging method and device based on functional test and terminal equipment
US20230119590A1 (en) Automatic identification of document sections to generate a searchable data structure
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN110968664A (en) Document retrieval method, device, equipment and medium
CN112395425A (en) Data processing method and device, computer equipment and readable storage medium
CN103744970B (en) A kind of method and device of the descriptor determining picture
US8862609B2 (en) Expanding high level queries
KR102260396B1 (en) System for hybride translation using general neural machine translation techniques
CN109542890B (en) Data modification method, device, computer equipment and storage medium
CN116775639A (en) Data processing method, storage medium and electronic device
CN111782790A (en) Document analysis method and device, electronic equipment and storage medium
WO2020057023A1 (en) Natural-language semantic parsing method, apparatus, computer device, and storage medium
CN110858214B (en) Recommendation model training and further auditing program recommendation method, device and equipment
CN111552785A (en) Method and device for updating database of human-computer interaction system, computer equipment and medium
CN111932412A (en) Contract drafting and revising method, device, storage medium and equipment
CN115860007B (en) Method and device for calculating index influence degree, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination