CN111581363B

CN111581363B - Knowledge extraction method, device, equipment and storage medium

Info

Publication number: CN111581363B
Application number: CN202010365979.8A
Authority: CN
Inventors: 章文俊; 甘露; 卜建辉; 吴伟佳
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2023-08-29
Anticipated expiration: 2040-04-30
Also published as: CN111581363A

Abstract

The application discloses a knowledge extraction method, a knowledge extraction device, knowledge extraction equipment and a storage medium, and relates to big data technology. The specific implementation scheme is as follows: acquiring the name of a field and the setting information of the field according to the information input by a user in a first page, wherein the setting information is used for extracting knowledge aiming at the field; creating a knowledge extraction task according to information input by a user in a second page; the knowledge extraction task comprises a field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information; executing a knowledge extraction task to obtain knowledge extraction answers; and outputting knowledge extraction answers. The knowledge extraction method provided by the application reduces the labor cost of knowledge extraction and improves the efficiency of knowledge extraction.

Description

Knowledge extraction method, device, equipment and storage medium

Technical Field

The application relates to the technical field of data processing, in particular to a big data technology.

Background

Knowledge graph construction is based on knowledge extraction or knowledge mining. Currently, knowledge extraction is mainly based on machine learning implementation of neural networks.

The neural network model needs to be trained in advance based on sample data, and the operation effect of the neural network model is closely related to the quantity and accuracy of the sample data. The training early stage lacks marking data, and sample data is obtained usually through a manual marking mode, so that the labor cost of knowledge extraction is increased, and the efficiency is lower.

Disclosure of Invention

The knowledge extraction method, the device, the equipment and the storage medium are provided, so that the labor cost of knowledge extraction is reduced, and the efficiency of knowledge extraction is improved.

According to a first aspect, there is provided a knowledge extraction method, comprising:

acquiring the name of a field and setting information of the field according to information input by a user in a first page, wherein the setting information is used for extracting knowledge aiming at the field;

creating a knowledge extraction task according to the information input by the user in the second page; the knowledge extraction task comprises the field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information;

executing the knowledge extraction task to obtain knowledge extraction answers;

and outputting the knowledge extraction answer.

It can be seen that, unlike the prior art, in the embodiment of the present application, the name of the field and the setting information of the field may be obtained through the information input by the user in the first page, and the knowledge extraction task may be created through the information input by the user in the second page, where the knowledge extraction task is used to perform knowledge extraction for the field according to the setting information of the field, so that the knowledge extraction task is performed, and the knowledge extraction answer is obtained and output. Compared with the prior art, the knowledge extraction method provided by the application does not use the neural network model to realize knowledge extraction, but realizes knowledge extraction on the document through the setting information of the field, thereby avoiding the process of acquiring the training sample and the pre-training model in advance when using the neural network model, reducing the labor cost, shortening the preparation time and improving the knowledge extraction efficiency.

According to a second aspect, there is provided a knowledge extraction apparatus comprising:

the acquisition module is used for acquiring the name of a field and the setting information of the field according to the information input by a user in a first page, wherein the setting information is used for extracting knowledge aiming at the field;

the creating module is used for creating a knowledge extraction task according to the information input by the user in the second page; the knowledge extraction task comprises the field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information;

the processing module is used for executing the knowledge extraction task and obtaining knowledge extraction answers;

and the output module is used for outputting the knowledge extraction answer.

According to a third aspect, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect described above.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect described above.

According to a fifth aspect, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which it can be read by at least one processor of an electronic device, the at least one processor executing the computer program causing the electronic device to perform the method of the first aspect.

One embodiment of the above application has the following advantages or benefits: the name of the field and the setting information of the field can be obtained through the information input by the user in the first page, the knowledge extraction task can be created through the information input by the user in the second page, and the knowledge extraction task is used for carrying out knowledge extraction on the field according to the setting information of the field, so that the knowledge extraction task is executed, a knowledge extraction answer is obtained and output, the labor cost is reduced, and the knowledge extraction efficiency is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is an application scenario diagram applicable to an embodiment of the present application;

FIG. 2 is a flowchart of a knowledge extraction method according to an embodiment of the present application;

FIG. 3 is another flowchart of a knowledge extraction method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a first page according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a knowledge extraction device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The application provides a knowledge extraction method. Knowledge extraction refers to extracting knowledge contained in an information source through processes of identification, understanding, screening, induction and the like, and storing the knowledge to form a knowledge element base. The knowledge extraction method provided by the application can realize knowledge extraction aiming at the document. The application is not limited to the type of document, for example, the type of document may include, but is not limited to, a text (TXT) type or a PDF type. The type of document may also be referred to as the format of the document. The present application is not limited to the content of the document, and may relate to the field of insurance, law, scientific literature, and the like, for example. The technical scheme provided by the application is illustrated by taking the application to the insurance field as an example.

Fig. 1 is an application scenario diagram applicable to an embodiment of the present application. The data transmission mode of the server 100 may be a wireless transmission mode or a wired transmission mode, and the present application is not limited to the data transmission mode. The input-output device 200 may display a page 201, and a user may input information in the page 201, upload a document, or create a knowledge extraction task. The present application is not limited to the name of the page 201, the layout of the page 201, and the display content of the page 201. Alternatively, page 201 may be a browser page. The server 100 can acquire information input by a user in the page 201 by performing data transmission with the input-output device 200, create and execute a knowledge extraction task, and acquire and output a knowledge extraction answer. Alternatively, knowledge extraction answers may be displayed in page 201.

The implementation of the server 100 is not limited, for example, a stand-alone server, a cluster server, and the like.

The implementation manner of the input/output device 200 is not limited, and may be, for example, a conventional display screen, a touch display screen, a smart phone, a tablet computer, etc.

The following description is made with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Fig. 2 is a flowchart of a knowledge extraction method according to an embodiment of the present application. In the knowledge extraction method provided in this embodiment, the execution body may be a knowledge extraction device or an electronic apparatus. As shown in fig. 2, the knowledge extraction method provided in this embodiment may include:

s201, acquiring the name of a field and setting information of the field according to information input by a user in a first page.

Wherein the setting information is used for extracting knowledge of the field.

Specifically, the user may input information in the first page, and the electronic device may acquire information input by the user in the first page, and acquire the name of the field and the setting information of the field according to the input information. Subsequently, for the document which needs to be subjected to knowledge extraction by the user, knowledge extraction of the document can be realized according to the field and the setting information thereof.

Note that, in this embodiment, the name of the first page, the layout of the first page, and the page display content are not limited, and the number of fields is not limited in this embodiment. For example, in one implementation, the user is allowed to enter the name of a field and the setting information for that field at a time in the first page. For another example, in another implementation, the user is allowed to enter the names of the multiple fields and the setting information for each field at a time in the first page. In various embodiments of the application, a plurality refers to two or more.

S202, creating a knowledge extraction task according to information input by a user in the second page.

The knowledge extraction task comprises a field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information.

Specifically, the user may input information in the second page, and the electronic device may acquire information input by the user in the second page, and create a knowledge extraction task according to the input information.

Alternatively, the information entered by the user in the second page may include the related information of the document to be processed and the name of the field. Optionally, the relevant information of the document to be processed may include at least one of the following: the name of the document to be processed or the storage address of the document to be processed. In various embodiments of the application, at least one refers to one, two, or more than two. Alternatively, the document to be processed may be a document uploaded by the user, a document stored in the electronic device in advance, or a document obtained by the electronic device in other manners. The page and time of uploading the document by the user are not limited in this embodiment. For example, the user may upload the document to the electronic device before entering the information in the first page, or the user may upload the document to the electronic device after entering the information in the first page. For example, the page to which the user uploads the document may be a separate page, or the user may upload the document in a second page.

It should be noted that, in this embodiment, the name of the second page, the layout of the second page, and the interface content are not limited, the number of documents to be processed is not limited, and the number of fields is not limited. In this embodiment, knowledge extraction may be done for each document to be processed and for each field. The following is illustrated by way of example.

Alternatively, in one example, the knowledge extraction task includes 1 field and 1 document to be processed. The document to be processed is referred to as document 1 to be processed, and the names of the fields are referred to as field 1. The knowledge extraction task specifically comprises the following steps: knowledge extraction for field 1 is performed on the document 1 to be processed according to the setting information of field 1.

Alternatively, in another example, the knowledge extraction task includes 2 fields and 1 document to be processed. The names of the 1,2 fields of the document to be processed are respectively referred to as field 1 and field 2. The knowledge extraction task specifically comprises the following steps: knowledge extraction for field 1 is performed on the document 1 to be processed according to the setting information of field 1, and knowledge extraction for field 1 is performed on the document 1 to be processed according to the setting information of field 2.

Alternatively, in another example, the knowledge extraction task includes 2 fields and 2 documents to be processed. The names of the 2 documents to be processed are respectively called a document to be processed 1 and a document to be processed 2, and the names of the 2 fields are respectively called a field 1 and a field 2. The knowledge extraction task specifically comprises the following steps: the method comprises the steps of performing knowledge extraction for the field 1 according to the setting information of the field 1, performing knowledge extraction for the field 2 according to the setting information of the field 2, and performing knowledge extraction for the field 2 according to the setting information of the field 2 on the document 1 to be processed.

S203, executing a knowledge extraction task to obtain a knowledge extraction answer.

S204, outputting knowledge extraction answers.

Alternatively, the knowledge extraction answer may be used for subsequent knowledge graph construction, or in the scenario shown in fig. 1, displayed in the page 201 displayed by the input-output device 200.

It can be seen that, in the knowledge extraction method provided in this embodiment, the name of the field and the setting information of the field may be obtained through the information input by the user in the first page, and the knowledge extraction task may be created through the information input by the user in the second page, where the knowledge extraction task is used to perform knowledge extraction for the field according to the setting information of the field, so that the knowledge extraction task is executed, and the knowledge extraction answer is obtained and output. Compared with the prior art, the knowledge extraction method provided by the embodiment does not use the neural network model to realize knowledge extraction, but realizes knowledge extraction on the document through the setting information of the field, so that the process of acquiring the training sample and the pre-training model in advance when the neural network model is used is avoided, the labor cost is reduced, the preparation time is shortened, and the knowledge extraction efficiency is improved.

In addition, according to the knowledge extraction method provided by the embodiment, the obtained knowledge extraction answers can be used as sample data for training the neural network model, so that the labor cost is reduced for training the neural network model, the sample data size is enlarged, and the training efficiency and the training accuracy of the neural network model can be improved.

Alternatively, in another embodiment of the present application, the setting information may include a knowledge extraction range, a knowledge extraction rule expression, a knowledge extraction return granularity, and the number of knowledge extraction answers, based on the embodiment shown in fig. 2.

The knowledge extraction scope is used for determining the knowledge extraction scope for the field in the document to be processed. Alternatively, the knowledge extraction scope may include a full text scope, a paragraph scope, and a title scope. Full text scope is used to indicate that knowledge extraction for fields is performed across the entire document of the document to be processed. The paragraph scope is used to indicate that knowledge extraction for fields is performed on specific paragraphs in the document to be processed with paragraph granularity. The method of determining a specific paragraph is not limited in this embodiment, and for example, the specific paragraph may be a paragraph including the name of a field. The header scope is used to indicate that knowledge extraction for fields is performed on paragraphs under a specific header in the document to be processed at the granularity of the header. The present embodiment does not limit the determination method of the specific title, and for example, the specific title may be a title including the name of the field.

The knowledge extraction rule expression is used for matching the content in the document to be processed, so that knowledge extraction is realized. The implementation manner of the knowledge extraction rule expression is not limited in this embodiment, and the structure of the expression and the meaning of each symbol in the expression may be predefined according to the need. Knowledge extraction rule expressions may also utilize existing expressions, such as regular expressions. In this embodiment, the number of knowledge extraction rule expressions corresponding to the fields is not limited, and may be 1 or more.

The knowledge extraction return granularity is used for indicating the return accuracy of the knowledge extraction answer. Alternatively, the knowledge extraction return granularity may include fields, sentences, and paragraphs. For example. Assume that the name of a field is "waiting period" in the insurance field. Where the knowledge extraction return granularity is a field, the knowledge extraction answer may be a value of "waiting period", for example 120 days, 3 weeks, 1 year, etc. When the knowledge extraction return granularity is sentence, the knowledge extraction answer may be sentence including "waiting period" in the document to be processed, for example, as follows: there is a waiting period of 120 days, a waiting period of 3 weeks, a time of 1 year is called a waiting period, and so on. When the knowledge extraction return granularity is a paragraph, the knowledge extraction answer may be a paragraph including a "waiting period" in the document to be processed.

The number of knowledge extraction answers is the number of knowledge extraction answers required after the user sets the knowledge extraction task for the document to be processed. Alternatively, the number of knowledge extraction answers may be set to "single answer", at which time the number of knowledge extraction answers is at most one. Alternatively, the number of knowledge extraction answers may be set to be "multiple answers", and in this case, the number of knowledge extraction answers is not limited, and may be 1 or more, depending on the actual result of knowledge extraction on the field of the document to be processed.

Optionally, fig. 3 is another flowchart of a knowledge extraction method according to an embodiment of the present application. As shown in fig. 3, in S203, performing a knowledge extraction task to obtain a knowledge extraction answer may include:

s301, determining information to be extracted in the document to be processed according to the knowledge extraction range.

S302, knowledge extraction is carried out on the information to be extracted according to the number of the knowledge extraction answers and the knowledge extraction rule expression, and knowledge extraction result information is obtained.

S303, obtaining knowledge extraction answers according to knowledge extraction return granularity and knowledge extraction result information.

Specifically, first, information to be extracted is determined in a document to be processed according to a knowledge extraction range. Optionally, if the knowledge extraction range is a full text range, the information to be extracted is a document to be processed. If the knowledge extraction range is a paragraph range, the information to be extracted includes a specific paragraph in the document to be processed. If the knowledge extraction range is the title range, the information to be extracted comprises paragraphs under specific titles in the document to be processed. And then, carrying out knowledge extraction on the information to be extracted according to the number of the knowledge extraction answers and the knowledge extraction rule expression to obtain knowledge extraction result information. And finally, obtaining knowledge extraction answers according to the knowledge extraction return granularity and knowledge extraction result information.

In this embodiment, for convenience of description, a result obtained by extracting knowledge from information to be extracted according to the number of answers for knowledge extraction and a rule expression for knowledge extraction is referred to as knowledge extraction result information, and a result obtained by processing the knowledge extraction result information according to a granularity of knowledge extraction return is referred to as a knowledge extraction answer. The knowledge extraction result information may be the same as or different from the knowledge extraction answer. For example. The knowledge extraction result information may be a paragraph in the document to be processed. If the knowledge extraction return granularity is a paragraph, at this time, knowledge extraction result information is the same as knowledge extraction answers, and both knowledge extraction result information and knowledge extraction answers are the paragraph. If the knowledge extraction return granularity is sentence, at this time, knowledge extraction result information is different from knowledge extraction answers, and the knowledge extraction answers can be specific sentences in the knowledge extraction result information.

Optionally, in S301, determining information to be extracted in the document to be processed according to the knowledge extraction range may include:

if the format of the document to be processed is not the preset document format, converting the document to be processed into an intermediate document according to the preset document format.

And analyzing the intermediate document to obtain document information. The document information includes each paragraph, each title, title hierarchy, and correspondence between titles and paragraphs in the document to be processed.

And determining information to be extracted according to the knowledge extraction range and the document information.

Specifically, the format of the document to be processed is not limited in this embodiment. However, a document format can be preset, and the development of the software code for knowledge extraction can be realized aiming at the document format, so that the code development cost is saved, and the universality of the software code is improved. The preset document format is not limited in this embodiment, and may be, for example, a text format. If the format of the document to be processed is not the preset document format, the document to be processed may be converted into an intermediate document according to the preset document format, for example, a PDF document may be converted into a text document. Then, the intermediate document is analyzed, document information is obtained, and information to be extracted is determined according to the knowledge extraction range and the document information. If the format of the document to be processed is the preset document format, the document to be processed is an intermediate document, and the subsequent steps are executed.

The document information comprises each paragraph, each title, title level and corresponding relation between the title and the paragraph in the document to be processed. Alternatively, each paragraph and each title may be uniquely distinguished by identification information, such as paragraph number, document page number in combination with document line number, etc. The implementation manner of obtaining the document information by analyzing the intermediate document is not limited, and any existing document analysis method can be adopted.

Optionally, in the knowledge extraction method provided in this embodiment, the setting information may further include an answer filtering rule.

Before outputting the knowledge extraction answer in S204, it may further include:

and filtering the knowledge extraction answers according to the answer filtering rules to obtain corrected knowledge extraction answers.

Accordingly, outputting the knowledge extraction answer in S204 may include:

and outputting the corrected knowledge extraction answer.

By setting the answer filtering rules, the answers which do not meet the requirements in the knowledge extraction answers can be filtered, and the accuracy of knowledge extraction is improved.

For example. Assume that the name of a field is "waiting period" in the insurance field. The answer filtering rules are: excluding "century". As "century" may be the name of an insurance product. Assuming that the knowledge extraction returns a granularity of fields, the knowledge extraction answers are 4 results in total for 120 days, 3 weeks, 1 year, and century. Then, the 'century' is filtered out according to the answer filtering rule, and the finally obtained knowledge extraction answer is 3 results in total of 120 days, 3 weeks and 1 year.

Optionally, in the knowledge extraction method provided in this embodiment, the setting information may further include mapping information, where the mapping information is used to indicate that a format of a knowledge extraction answer is a preset answer format.

if the format of the knowledge extraction answer is not the preset answer format, converting the knowledge extraction answer according to the preset answer format to obtain a converted knowledge extraction answer.

Accordingly, outputting the knowledge extraction answer in S204 may include:

and outputting the converted knowledge to extract an answer.

By setting the mapping information, knowledge extraction answers can be uniformly converted into a specified format, and subsequent data processing is facilitated.

For example. Assume that the name of a field is "waiting period" in the insurance field.

Alternatively, in one example, the mapping information is "waiting period" in days. Assuming that the knowledge extraction return granularity is a field, the knowledge extraction answers are 120 days, 3 weeks, 1 year. Then, according to the mapping information, the finally obtained knowledge extraction answer is 120 days, 21 days, 365 days.

Alternatively, in another example, the mapping information is: the "waiting period" is XX. Assuming that the knowledge extraction return granularity is sentence, the knowledge extraction answer is: a waiting period of 120 days, a waiting period of 3 weeks, and a time of 1 year is called a waiting period. Then, according to the mapping information, the finally obtained knowledge extraction answer is: the waiting period was 120 days, the waiting period was 3 weeks, and the waiting period was 1 year.

Optionally, in another embodiment of the present application based on the embodiment shown in fig. 3, an implementation manner of performing knowledge extraction on information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression in S302 to obtain knowledge extraction result information is described.

Optionally, in one implementation, the number of knowledge extraction answers is one. In S302, knowledge extraction is performed on information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression, so as to obtain knowledge extraction result information, which may include:

if the knowledge extraction rule expression is one, knowledge extraction is carried out on the information to be extracted according to the knowledge extraction rule expression until knowledge extraction result information is obtained, and knowledge extraction is stopped.

If the knowledge extraction rule expressions are at least two, knowledge extraction is sequentially carried out on the information to be extracted according to the at least two knowledge extraction rule expressions from high priority to low priority of the at least two knowledge extraction rule expressions until knowledge extraction result information is obtained, and knowledge extraction is stopped.

The following is described in connection with examples.

Alternatively, in one example, assume that the knowledge extraction rule expression is 1, referred to as expression 1. And carrying out knowledge extraction on the information to be extracted according to the expression 1, and stopping knowledge extraction as long as one knowledge extraction result information is obtained.

Alternatively, in another example, it is assumed that the knowledge extraction rule expressions are 3, respectively referred to as expression 1 to expression 3, wherein the priority of expression 1 is the highest and the priority of expression 3 is the lowest. Then, knowledge extraction is first performed on the information to be extracted according to expression 1. In the first scenario, if knowledge extraction is performed according to expression 1 to obtain one knowledge extraction result information, knowledge extraction is stopped, and knowledge extraction is not performed according to expression 2 and expression 3 later. In the second scenario, if knowledge extraction result information is not obtained according to expression 1, knowledge extraction of information to be extracted according to expression 2 is continued. In the third scenario, if knowledge extraction is performed according to expression 2 to obtain knowledge extraction result information, knowledge extraction is stopped, and knowledge extraction is not performed according to expression 3.

Alternatively, in another implementation, the number of knowledge extraction answers is one. In S302, knowledge extraction is performed on information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression, so as to obtain knowledge extraction result information, which may include:

Carrying out knowledge extraction on the information to be extracted according to the knowledge extraction rule expression to obtain at least one knowledge extraction result intermediate information;

and obtaining knowledge extraction result information according to at least one knowledge extraction result intermediate information.

The following is described in connection with examples.

Assume that the name of a field is "waiting period" in the insurance field. The knowledge extraction rule has 2 expressions, which are respectively referred to as expression 1 and expression 2. Knowledge extraction is carried out on information to be extracted according to expression 1, and 1 knowledge extraction result intermediate information can be obtained, wherein the intermediate information is 120 days. And carrying out knowledge extraction on the information to be extracted according to the expression 2, and obtaining 1 knowledge extraction result intermediate information, wherein the intermediate information is 3 weeks. The minimum, maximum, average or weighted average of "120 days" and "3 weeks" may be used as knowledge extraction result information.

It should be noted that, in this embodiment, the implementation manner of obtaining the knowledge extraction result information according to at least one knowledge extraction result intermediate information is not limited, the fields are different, and the implementation manner may be different.

Optionally, in yet another implementation, the number of knowledge extraction answers is at least one. In S302, knowledge extraction is performed on information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression, so as to obtain knowledge extraction result information, which may include:

And carrying out knowledge extraction on the information to be extracted according to each knowledge extraction rule expression, and obtaining knowledge extraction result information corresponding to each knowledge extraction rule expression.

In such an implementation, the number of knowledge extraction rule expressions is not limited. And when the number of the knowledge extraction rule expressions is 1, carrying out knowledge extraction on the information to be extracted according to the knowledge extraction rule expressions to obtain all knowledge extraction result information. When the number of the knowledge extraction rule expressions is multiple, knowledge extraction is sequentially carried out on the information to be extracted according to each knowledge extraction rule expression, and knowledge extraction result information corresponding to all the extraction rule expressions is obtained.

In the following, on the basis of the above-described method embodiment of the present application, a first page will be exemplarily described with reference to fig. 4. Wherein fig. 4 is not limiting on the first page formation.

As shown in fig. 4, the name of the first page may be referred to as a "field set" page. In the first page, a plurality of input boxes may be included.

Wherein the input box 401 is used to set "field name", the name of the field can be input.

Input box 402 is used to set a "recall mode," i.e., a knowledge extraction scope. Where "full text match" is used to indicate full text range, "paragraph match" is used to indicate paragraph range, and "title match" is used to indicate title range.

Input boxes 403-405 are used to set "recall rules" for setting specific rules when "recall mode" is "paragraph match" or "title match". Alternatively, the input boxes 404 to 405 may input keywords.

The input box 406 is used to set "answer rules", i.e., knowledge extraction rule expressions.

The input box 407 is used to set a "return granularity," i.e., knowledge extraction return granularity. Where "exact answer" is used to indicate field granularity, "sentence" is used to indicate sentence granularity, and "paragraph" is used to indicate paragraph granularity.

Input boxes 408-410 are used to set "answer filter", answer filter rules. Wherein, the input boxes 409 to 410 may input keywords.

The input boxes 411 to 412 are used to set "answer post-processing", i.e., the number of knowledge extraction answers. Wherein the input box 412 is used to set a specific processing rule when the input box 411 is "single answer".

The input box 413 is used to set "answer map", i.e., map information.

Optionally, a button 414 may be further included in the first page, and after the user inputs information in the first page, the user clicks the button 414 to save the information input by the user. The name of the button 414 is not limited in this embodiment. For example, in FIG. 4, the name "commit". For another example, the name may be "save".

Fig. 5 is a schematic structural diagram of a knowledge extraction device according to an embodiment of the present application. As shown in fig. 5, the knowledge extraction device provided in this embodiment may include:

an obtaining module 501, configured to obtain a name of a field and setting information of the field according to information input by a user in a first page, where the setting information is used to perform knowledge extraction for the field;

a creating module 502, configured to create a knowledge extraction task according to information input by the user in the second page; the knowledge extraction task comprises the field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information;

a processing module 503, configured to execute the knowledge extraction task to obtain a knowledge extraction answer;

and an output module 504, configured to output the knowledge extraction answer.

Optionally, the setting information includes a knowledge extraction range, a knowledge extraction rule expression, a knowledge extraction return granularity, and the number of knowledge extraction answers.

Optionally, the processing module 503 is specifically configured to:

determining information to be extracted from the document to be processed according to the knowledge extraction range;

Carrying out knowledge extraction on the information to be extracted according to the number of the knowledge extraction answers and the knowledge extraction rule expression to obtain knowledge extraction result information;

and obtaining the knowledge extraction answer according to the knowledge extraction return granularity and the knowledge extraction result information.

Optionally, the number of knowledge extraction answers is one, and the processing module 503 is specifically configured to:

if the knowledge extraction rule expression is one, carrying out knowledge extraction on the information to be extracted according to the knowledge extraction rule expression until the knowledge extraction result information is obtained, and stopping carrying out knowledge extraction;

and if the number of the knowledge extraction rule expressions is at least two, knowledge extraction is carried out on the information to be extracted according to the at least two knowledge extraction rule expressions in sequence from high priority to low priority of the at least two knowledge extraction rule expressions until knowledge extraction result information is obtained, and knowledge extraction is stopped.

Optionally, the number of knowledge extraction answers is at least one, and the processing module 503 is specifically configured to:

Optionally, the processing module 503 is specifically configured to:

if the format of the document to be processed is not the preset document format, converting the document to be processed into an intermediate document according to the preset document format;

analyzing the intermediate document to obtain document information; the document information comprises each paragraph, each title, title level and the corresponding relation between the title and the paragraph in the document to be processed;

and determining the information to be extracted according to the knowledge extraction range and the document information.

Optionally, the setting information further includes an answer filtering rule;

the processing module 503 is further configured to:

filtering the knowledge extraction answers according to the answer filtering rules to obtain corrected knowledge extraction answers;

the output module 504 is specifically configured to:

and outputting the corrected knowledge extraction answer.

Optionally, the setting information further includes mapping information, where the mapping information is used to indicate that a format of the knowledge extraction answer is a preset answer format;

the processing module 503 is further configured to:

if the format of the knowledge extraction answer is not the preset answer format, converting the knowledge extraction answer according to the preset answer format to obtain a converted knowledge extraction answer;

The output module 504 is specifically configured to:

and outputting the converted knowledge extraction answer.

Optionally, the knowledge extraction range includes a full text range, a paragraph range, and a title range.

Optionally, the knowledge extraction return granularity includes a field, a sentence, and a paragraph.

The knowledge extraction device provided in this embodiment is used to execute the knowledge extraction method provided in the method embodiment of the present application, and the technical principle and the technical effect are similar, and are not described herein again.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

According to an embodiment of the present application, there is also provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in FIG. 6, the electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, on-board systems in automobiles (otherwise known as on-board computers), and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, smart phones, tablets, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 6, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. One processor 801 is illustrated in fig. 6.

Memory 802 is a non-transitory computer readable storage medium provided by the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the knowledge extraction method provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the knowledge extraction method provided by the present application.

The memory 802 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 501, the creation module 502, the processing module 503, and the output module 504 shown in fig. 5) corresponding to the knowledge extraction method in the embodiment of the application. The processor 801 executes various functional applications of the electronic device and data processing, i.e., implements the knowledge extraction method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 802.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory located remotely from processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 6.

The input device 803 may receive data or information transmitted by other devices or apparatuses (e.g., microphone array, etc.), numeric or character information that may also be input, and input devices that generate key signals related to user settings and function control of the electronic apparatus described above, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, etc. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device through which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the data to be processed in the persistent layer is obtained through interception, and if the data to be processed comprises the target class to be encrypted or decrypted and the data to be processed comprises the target field to be encrypted or decrypted in the target class according to the compiled file, the target field to be encrypted or decrypted is encrypted or decrypted, so that the target data is obtained. Therefore, the embodiment of the application does not need to modify the related codes of the database insertion and reading operation in the original program, reduces the modification amount of the original program and improves the data processing efficiency.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A knowledge extraction method, comprising:

creating a knowledge extraction task according to the related information of the document to be processed and the name of the field, which are input by the user in the second page; the knowledge extraction task comprises the field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information;

Executing the knowledge extraction task to obtain knowledge extraction answers;

and outputting the knowledge extraction answer.

2. The method of claim 1, wherein the setting information includes a knowledge extraction scope, a knowledge extraction rule expression, a knowledge extraction return granularity, and a number of knowledge extraction answers.

3. The method of claim 2, wherein the performing the knowledge extraction task to obtain knowledge extraction answers comprises:

4. The method of claim 3, wherein the number of knowledge extraction answers is one, the knowledge extraction is performed on the information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression, and knowledge extraction result information is obtained, including:

5. The method of claim 3, wherein the number of knowledge extraction answers is at least one, the knowledge extraction is performed on the information to be extracted according to the number of knowledge extraction answers and the knowledge extraction rule expression, and knowledge extraction result information is obtained, including:

6. The method according to any one of claims 3-5, wherein said determining information to be extracted in said document to be processed according to said knowledge extraction scope comprises:

7. The method of any of claims 2-5, wherein the setup information further includes answer filtering rules;

before the knowledge extraction answer is output, the method further comprises the following steps:

the outputting the knowledge extraction answer comprises the following steps:

and outputting the corrected knowledge extraction answer.

8. The method according to any one of claims 2 to 5, wherein the setting information further includes mapping information for indicating that a format of the knowledge extraction answer is a preset answer format;

the outputting the knowledge extraction answer comprises the following steps:

and outputting the converted knowledge extraction answer.

9. The method of any of claims 2-5, wherein the knowledge extraction scope comprises a full text scope, a paragraph scope, and a title scope.

10. The method of any of claims 2-5, wherein the knowledge extraction return granularity comprises fields, sentences, and paragraphs.

11. A knowledge extraction device, comprising:

the creating module is used for creating a knowledge extraction task according to the related information of the document to be processed and the name of the field, which are input by the user in the second page; the knowledge extraction task comprises the field and a document to be processed, and is used for extracting knowledge of the field from the document to be processed according to the setting information;

and the output module is used for outputting the knowledge extraction answer.

12. The apparatus of claim 11, wherein the setting information comprises a knowledge extraction scope, a knowledge extraction rule expression, a knowledge extraction return granularity, and a number of knowledge extraction answers.

13. The apparatus of claim 12, wherein the processing module is specifically configured to:

14. The apparatus of claim 13, wherein the number of knowledge extraction answers is one, and the processing module is specifically configured to:

15. The apparatus of claim 13, wherein the number of knowledge extraction answers is at least one, and the processing module is specifically configured to:

16. The apparatus according to any one of claims 13-15, wherein the processing module is specifically configured to:

17. The apparatus of any of claims 12-15, wherein the setup information further comprises answer filtering rules;

the processing module is further configured to:

the output module is specifically configured to:

and outputting the corrected knowledge extraction answer.

18. The apparatus according to any one of claims 12-15, wherein the setting information further includes mapping information, the mapping information being used to indicate that a format of the knowledge extraction answer is a preset answer format;

the processing module is further configured to:

the output module is specifically configured to:

and outputting the converted knowledge extraction answer.

19. The apparatus of any of claims 12-15, wherein the knowledge extraction range comprises a full text range, a paragraph range, and a title range.

20. The apparatus of any of claims 12-15, wherein the knowledge extraction return granularity comprises a field, a sentence, and a paragraph.

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.