CN116842151A

CN116842151A - Question-answer model construction, knowledge base creation, question-answer searching method and electronic equipment

Info

Publication number: CN116842151A
Application number: CN202310644992.0A
Authority: CN
Inventors: 陈祖龙; 林智超; 江悦; 王静
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-10-03

Abstract

The embodiment of the application discloses a question and answer model construction, knowledge base creation, question and answer searching method and device, a computer readable storage medium and electronic equipment. The method comprises the following steps: the question-answering model receives the call and obtains the target problem to be consulted, which is determined according to the search request submitted by the user; matching the target documents associated with the target problems from a knowledge base associated with the target organization to which the user belongs; and generating an answer corresponding to the target question according to the target document. By the scheme, the operation link of the search questions and answers is simplified, the search questions and answers efficiency is improved, and the search questions and answers experience is optimized.

Description

Question-answer model construction, knowledge base creation, question-answer searching method and electronic equipment

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and apparatus for constructing a question-answer model, a method and apparatus for creating a knowledge base, a method and apparatus for searching for questions and answers, a computer readable storage medium, and an electronic device.

Background

A large number of documents can be generated in the daily office process of an enterprise, and different types of documents can be archived and saved on different platforms. Taking division of document types according to departments as an example, documents of research and development departments, documents of financial departments and the like can be correspondingly stored in different platforms.

When an enterprise employee refers to a related document, the employee needs to log in to access a platform for storing the document of the type, but the entrance of each platform is scattered, and the employee needs to know the corresponding entrance of each platform to search and inquire the related document, so that the user experience is poor.

In addition, in consideration of the fact that the number of documents involved in enterprises is large, from the viewpoints of time cost and labor cost, a FAQ (frequently asked questions) mode cannot be adopted for knowledge base management, so that after an employee logs in a target platform to be accessed, the target platform can only match corresponding documents according to search requests input by the employee to output the documents, and for the problem of finer granularity to be searched by the employee, the user needs to review the documents by himself, and the user searching experience is poor.

How to optimize the search question and answer efficiency of staff for documents in enterprises becomes a technical problem which needs to be solved by the technicians in the field.

Disclosure of Invention

The application provides a method and a device for constructing a question-answering model, a method and a device for creating a knowledge base, a method and a device for searching for questions and answers, a computer readable storage medium and electronic equipment, which are beneficial to simplifying the operation link of searching for questions and answers, improving the efficiency of searching for questions and answers and optimizing the experience of searching for questions and answers.

The application provides the following scheme:

a question-answering model construction method comprises the following steps:

obtaining a training sample, wherein the training sample comprises a plurality of sample questions, and sample documents and sample answers respectively associated with the sample questions, and the sample documents are attributed to a target organization;

obtaining an initial model for generating an answer, wherein the initial model comprises a document retrieval model and an answer generation model, and the output of the document retrieval model is used as the input of the answer generation model;

and carrying out model training on the document retrieval model through the sample documents respectively associated with the different sample questions, and carrying out model training on the answer generation model through the sample documents respectively associated with the different sample questions and the sample answers to obtain a question-answer model, so that the question-answer model has the capability of retrieving the documents associated with the questions and the capability of generating answers corresponding to the questions according to the retrieved documents.

Wherein the training sample further comprises: sample knowledge blocks respectively associated with different sample problems, wherein the sample knowledge blocks are obtained by segmentation from sample documents associated with the sample problems, and then

The answer generation model includes: the document segmentation model and the answer generation sub-model,

And performing model training on the answer generation model through the sample document and the sample answer respectively associated with the different sample questions, wherein the model training comprises the following steps:

and carrying out model training on the document segmentation model through the sample documents respectively associated with the different sample questions and the sample knowledge blocks cut from the sample documents, and carrying out model training on the answer generation sub-model through the sample knowledge blocks respectively associated with the different sample questions and the sample answers to obtain the question-answer model, so that the question-answer model has the capability of cutting the knowledge blocks from the retrieved documents and the capability of generating answers corresponding to the questions according to the cut knowledge blocks.

A knowledge base creation method, comprising:

creating a knowledge base associated with a target organization, and constructing a keyword search engine and a vector search engine of the knowledge base;

and obtaining the documents associated with the target organization, saving the documents to the knowledge base, and creating keyword indexes and vector indexes of the documents.

A search question and answer method, comprising:

the question-answering model receives the call and obtains the target problem to be consulted, which is determined according to the search request submitted by the user;

matching the target documents associated with the target problems from a knowledge base associated with the target organization to which the user belongs;

And generating an answer corresponding to the target question according to the target document.

The obtaining the target problem to be consulted, which is determined according to the search request submitted by the user, comprises the following steps:

determining a target object aimed by the search request;

the target problems are determined from problems associated with the target objects under the target organization, and the problems associated with the target objects are determined when the question-answering model is trained.

The matching of the target documents associated with the target problems from the knowledge base associated with the target organization to which the user belongs comprises the following steps:

matching a first document associated with the target problem in the knowledge base through a keyword search engine of the knowledge base, and matching a second document associated with the target problem in the knowledge base through a vector search engine of the knowledge base;

the target document is determined from the first document and the second document.

Wherein the method further comprises:

and obtaining the authority information associated with the user under the target organization so as to determine a target document matched with the authority information from the knowledge base.

The generating an answer corresponding to the target question according to the target document comprises the following steps:

And cutting out a knowledge block related to the target problem from the target document, and generating an answer corresponding to the target problem according to the knowledge block.

Wherein the method further comprises:

obtaining context information associated with the search request, the context information including at least one of: identity information associated with the user under the target organization and attribute information associated with the target object aimed at by the search request under the target organization;

the generating the answer corresponding to the target question comprises the following steps: and generating the answer by taking the context information as a constraint condition.

Wherein the method further comprises:

obtaining a new document associated with the target problem;

the generating the answer corresponding to the target question comprises the following steps: and generating the answer according to the newly added document and the target document.

A search question and answer method, comprising:

establishing a session with a question-answer model through an intelligent dialogue client and providing a session interface;

obtaining a search request submitted by a user in a dialogue mode through the session interface, and sending the search request to the question-answer model so that the question-answer model obtains a target problem to be consulted, which is determined according to the search request, and matching a target document associated with the target problem from a knowledge base associated with a target organization to which the user belongs, and generating an answer corresponding to the target problem according to the target document;

And obtaining an answer corresponding to the target question, and displaying the answer on the session interface.

A search question and answer method, comprising:

obtaining an answer generated by a question-answer model aiming at a target problem, wherein the answer is generated by the question-answer model according to target documents matched from a knowledge base associated with a target organization and associated with the target problem;

and establishing an association relation between the target question and the answer so as to provide the answer associated with the target question when a search request aiming at the target question is obtained.

A search question and answer method, comprising:

obtaining at least two target organizations with binding relations, and associating knowledge bases associated with the target organizations;

when obtaining search requests submitted by users belonging to the at least two target organizations, document matching is carried out from the knowledge base associated with the at least two target organizations, and a question-answer model is input for answer generation.

A question-answering model construction device comprises:

the training sample obtaining unit is used for obtaining a training sample, wherein the training sample comprises a plurality of sample questions, and sample documents and sample answers respectively associated with the different sample questions, and the sample documents are attributed to a target organization;

An initial model obtaining unit configured to obtain an initial model for generating an answer, the initial model including a document retrieval model and an answer generation model, and an output of the document retrieval model being an input of the answer generation model;

the model training unit is used for carrying out model training on the document retrieval model through the sample documents respectively associated with the different sample questions, carrying out model training on the answer generation model through the sample documents respectively associated with the different sample questions and the sample answers, and obtaining a question-answer model, so that the question-answer model has the capability of retrieving the documents associated with the questions and the capability of generating answers corresponding to the questions according to the retrieved documents.

A knowledge base creation apparatus comprising:

the search engine construction unit is used for creating a knowledge base associated with the target organization and constructing a keyword search engine and a vector search engine of the knowledge base;

and the index creating unit is used for obtaining the documents associated with the target organization, saving the documents to the knowledge base and creating keyword indexes and vector indexes of the documents.

A search question and answer device, comprising:

the target problem obtaining unit is used for obtaining a target problem to be consulted, which is determined according to a search request submitted by a user when the question-answering model accepts the call;

The target document matching unit is used for matching the target documents associated with the target problems from a knowledge base associated with the target organization to which the user belongs;

and the answer generating unit is used for generating an answer corresponding to the target question according to the target document.

A search question and answer device, comprising:

the session interface providing unit is used for establishing a session with the question-answer model through the intelligent dialogue client and providing a session interface;

the search request submitting unit is used for obtaining a search request submitted by a user in a dialogue mode through the session interface and sending the search request to the question-answer model so that the question-answer model obtains a target problem to be consulted determined according to the search request, and matching a target document associated with the target problem from a knowledge base associated with a target organization to which the user belongs, and generating an answer corresponding to the target problem according to the target document;

and the answer obtaining unit is used for obtaining the answer corresponding to the target question and displaying the answer on the session interface.

A search question and answer device, comprising:

an answer obtaining unit, configured to obtain an answer generated by a question-answer model for a target question, where the answer is generated by the question-answer model according to a target document matching from a knowledge base associated with a target organization and associated with the target question;

And the association relation establishing unit is used for establishing the association relation between the target question and the answer so as to provide the answer associated with the target question when a search request aiming at the target question is obtained.

A search question and answer device, comprising:

the knowledge base association unit is used for obtaining at least two target organizations with binding relations and associating the knowledge bases associated with the target organizations;

and the answer generating unit is used for carrying out document matching from the knowledge base associated with the at least two target organizations when obtaining search requests submitted by users belonging to the at least two target organizations, and inputting a question-answer model to carry out answer generation.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the preceding claims.

An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding claims.

According to the specific embodiment provided by the application, the application discloses the following technical effects:

According to the method and the device, aiming at various documents associated with the target organization, a centralized problem consultation service can be provided for the user through training the obtained question-answering model, so that an operation link of the user for searching questions and answers is simplified, and the searching question-answering efficiency is improved. In addition, the question-answering model can generate a targeted answer aiming at a target question submitted by a user, and the user does not need to participate in the query with finer granularity, so that the search question-answering experience of the user is optimized.

Of course, it is not necessary for any one product to practice the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for constructing a question-answer model provided by an embodiment of the present application;

FIG. 2 is a flow chart of a search question and answer method provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a search question-answering system provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a question-answering model construction device provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a knowledge base creation apparatus provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a search question-answering apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.

The following describes in detail a specific implementation process of the question-answer model construction method provided by the embodiment of the present application, referring to a flowchart shown in fig. 1, may include:

s101: a training sample is obtained, the training sample comprising a plurality of sample questions, and sample documents and sample answers each associated with a different sample question, the sample documents being assigned to a target organization.

As one example, training samples may be determined from historical search questions and answers associated with a target organization. Taking the application APP1 provided by the target organization as an example, the relevant sample problem can be represented as: how APP1 downloads, how APP1 installs, how APP1 uses, etc., the sample document can be embodied as a user instruction manual for APP1, and the sample answers can be answers generated for manual labels.

S102: an initial model for generating an answer is obtained, the initial model including a document retrieval model and an answer generation model, and an output of the document retrieval model is used as an input of the answer generation model.

As an example, model training may be based on conventional deep learning approaches. For example, a network topology may be determined based on convolutional neural networks (Convolutional Neural Networks, CNN), and a document retrieval model and an answer generation model may be constructed.

Or, as another example, the initial model may be embodied as a large model obtained by a pre-training technique.

It can be understood that the large model generated by training on large-scale wide data can have various basic capabilities, not only can the expression capability of the large model be improved, but also the generalization capability of the large model can be optimized, and further the large model is generalized to different downstream tasks through a small amount of training samples based on a fine tuning technology, so that a fine tuning model corresponding to the downstream tasks is obtained. In this example, the downstream task is an answer generation task, and the fine tuning model corresponding to the downstream task is a question-answer model obtained by model training.

In one implementation, the large model may be embodied as a large language model (Large Language Model, abbreviated LLM), i.e., a deep learning model trained using large amounts of text data, which may generate natural language text or understand the meaning of language text.

It should be noted that, in practical application, different modes may be used to construct the document retrieval model and the answer generation model, which is not limited in this aspect, for example, the model training may be performed by using the large model obtained by pre-training as the answer generation model.

S103: and carrying out model training on the document retrieval model through the sample documents respectively associated with the different sample questions, and carrying out model training on the answer generation model through the sample documents respectively associated with the different sample questions and the sample answers to obtain a question-answer model, so that the question-answer model has the capability of retrieving the documents associated with the questions and the capability of generating answers corresponding to the questions according to the retrieved documents.

In the embodiment of the application, the functions realized by the question-answer model at least can comprise:

1. for document retrieval

When the document retrieval function is realized through training, the sample problem can be used as the input of a document retrieval model, the model takes a prediction document matched with a sample document associated with a target organization as the output, the model is compared with the sample document associated with the sample problem, and when the matching degree of the prediction document and the corresponding sample document meets the requirement of a corresponding loss function, the completion of the training of the document retrieval model is determined.

2. For answer generation

When the answer generation function is realized through training, a sample document associated with the sample question is used as input of an answer generation model, a predicted answer generated by the model is used as output, then the predicted answer is compared with a sample answer associated with the sample question, and when the matching degree of the predicted answer and the corresponding sample answer meets the requirement of a corresponding loss function, the answer generation model is determined to be trained.

In addition, considering that the documents of the target organization are usually long, more text contents are contained, on one hand, the more model of the input text contents is, the more difficult it is to analyze the text contents; on the other hand, the model also has a limit on the input length, and the model can be automatically cut off and discarded after exceeding the preset length, which may affect the accuracy of generating the model answers.

Correspondingly, in order to further improve the generating effect of the answer generating model and reduce the generating difficulty, the answer generating model according to the embodiment of the application may further include: document segmentation model and answer generation sub-model. Correspondingly, the training samples can further comprise: sample knowledge blocks respectively associated with different sample questions, wherein the sample knowledge blocks are obtained by segmentation from sample documents associated with the sample questions, and model training is carried out on an answer generation model, and the method can comprise the following steps: and carrying out model training on the document segmentation model through the sample documents respectively associated with the different sample questions and the sample knowledge blocks cut from the sample documents, and carrying out model training on the answer generation sub-model through the sample knowledge blocks respectively associated with the different sample questions and the sample answers to obtain the question-answer model, so that the question-answer model has the capability of cutting the knowledge blocks from the retrieved documents and the capability of generating answers corresponding to the questions according to the cut knowledge blocks.

When the document segmentation function is realized through training, a sample document associated with a sample problem can be used as input of a document segmentation model, a prediction knowledge block obtained through model segmentation is used as output, then the prediction knowledge block is compared with a sample knowledge block segmented from the sample document, and when the matching degree of the prediction knowledge block and a corresponding sample knowledge block meets the requirement of a corresponding loss function, the completion of training of the document segmentation model is determined.

Correspondingly, when the answer generation function is realized through training, a sample knowledge block associated with a sample question can be used as input of an answer generation sub-model, a prediction answer generated by the sub-model is used as output, then the prediction answer is compared with a sample answer associated with the sample question, and when the matching degree of the prediction answer and the corresponding sample answer meets the requirement of a corresponding loss function, the answer generation sub-model is determined to be completely trained.

As an example, an embodiment of the present application may provide a solution for providing a search question-answer service to a user based on a question-answer model constructed according to the solution shown in fig. 1, and referring to the flowchart shown in fig. 2, may include:

s201: and the question-answering model accepts the call to obtain the target problem to be consulted, which is determined according to the search request submitted by the user.

As an example, the question-answer model provided by the embodiment of the application can be deployed on a cloud server and provides a call interface for the outside. In response to this, the search question-answering system provided by the embodiment of the present application may include, as shown in fig. 3: client and server. The client can be deployed on a terminal device associated with a user belonging to a target organization, and exists in the form of a webpage or an independent application program. The client can provide a search question and answer page, and the search request submitted by the user can be sent to the server through the page. The server can be deployed on the cloud server, and after a search request submitted by a user is obtained, a call interface externally provided by the question-answer model can be called to trigger the question-answer model to generate an answer.

Or, as another example, the question-answer model provided by the embodiment of the application can be deployed locally on the terminal device associated with the user. Thus, after obtaining the search request submitted by the user, the client can trigger the question-answer model deployed locally to generate an answer in a local calling mode.

It is known in connection with practical applications that a search request submitted by a user may have an explicit search intent. For example, the user inputs "where APP1 can be downloaded from", the search request can determine that the target object for which the search is performed is APP1, and the corresponding question type is downloading, so that it can determine how the target question to be consulted is APP1 downloading, and can perform document retrieval for the target question with explicit search intention.

Furthermore, search requests submitted by users may not have an explicit search intent. For example, the user inputs "APP1", and the search request can specify the target object for which the search is performed, but cannot determine the corresponding problem type. Correspondingly, after the target object aimed by the search request is determined, the problem associated with the target object under the target organization can be obtained, and the target problem is determined. That is, the intention understanding and question expansion can be performed on the search request input by the user, the target question meeting the search requirement of the user is determined, and then the document retrieval and answer generation are performed.

As one example, the questions associated with the target object under the target organization may be determined when the question-answering model is trained. Taking APP1 provided by a target organization as a target object as an example, problems associated with APP1 under the target organization can be determined according to sample problems related to samples used for training, and in addition, priority weights of different problems can be determined by combining frequencies of the problems in the training samples. Therefore, when the problem expansion is performed, a part of the problems associated with the target object can be selected as the target problem according to the priority weight information, and the problem can be determined according to actual use requirements, and the embodiment of the application is not limited to the problem.

It can be understood that the implementation process of determining the target problem to be consulted according to the search request can be implemented by a question-answering model or a server, which is not limited by the embodiment of the present application.

S202: and matching the target document associated with the target problem from a knowledge base associated with the target organization to which the user belongs.

As an example, the target documents associated with the target questions may be matched from a knowledge base associated with the target organization based on keyword retrieval techniques, thereby improving the accuracy of document retrieval. The specific implementation process may refer to the related art, and the embodiment of the present application is not limited thereto.

In addition, in order to solve the problem of poor long tail coverage effect of search, the embodiment of the application can also match the target documents associated with the target problems based on a vector retrieval technology, so that the generalization of document retrieval is improved. Corresponding to this, a Knowledge Base (knowledgebase) that can be structured for documents can be created in the following manner:

creating a knowledge base associated with a target organization, and constructing a keyword search engine and a vector search engine of the knowledge base; and obtaining the documents associated with the target organization, saving the documents to the knowledge base, and creating keyword indexes and vector indexes of the documents. In this way, the target documents associated with the target problems can be searched in the knowledge base by the keyword search engine in a manner of searching the keyword index, and the target documents associated with the target problems can be searched in the knowledge base by the vector search engine in a manner of searching the vector index. With respect to implementation procedures related to knowledge base creation, reference may be made to related technologies, which are not limited in this embodiment of the present application.

Specifically, the matching the target document associated with the target problem from the knowledge base associated with the target organization to which the user belongs may include: matching a first document associated with the target problem in the knowledge base through a keyword search engine of the knowledge base, and matching a second document associated with the target problem in the knowledge base through a vector search engine of the knowledge base; the target document is determined from the first document and the second document.

After the first document and the second document are obtained by searching, weight information corresponding to each document can be calculated, and a target document is selected according to the weight information. It will be appreciated that documents retrievable in both ways have higher weights and documents retrieved in only one way have lower weights. The implementation manner of selecting the target document can be specifically set according to actual use requirements, and the embodiment of the application does not limit the implementation manner of determining the target document, the number of the target documents, the number of the first documents, the number of the second documents and the like.

In practical applications, the documents of the target organization are relatively wide in content, and are generally provided with viewing rights. Taking a target organization as an enterprise, for example, documents among different departments of the enterprise may have permission management requirements, documents of different types or different security levels of the same department may also have permission management requirements among users of different levels. For the answer generation model, especially, the answer generation model obtained based on the fine tuning of the large model has uncontrollable answer generation because the semantic recognition capability of the large model is reserved.

Correspondingly, the embodiment of the application can sink the authority control to the search engine, and ensure that the target document input to the answer generation model is a document meeting the authority constraint requirement. That is, during document retrieval, rights information associated with the user under the target organization may be obtained to determine a target document from the knowledge base that matches the rights information.

As an example, after the first document and the second document are matched, a target document matched with the authority information can be selected from the first document and the second document according to the authority information; or, when the document is searched, the first document matched with the authority information and the second document matched with the authority information are searched according to the authority information, and the determined target document is the target document matched with the authority information.

As an example, a unified rights setting may be performed on documents that need to be saved to a knowledge base by submitting a configuration file for representing rights to the documents. For example, a document authority configuration page may be provided, relevant information of a document to be saved to a knowledge base may be provided in the page, for example, the relevant information may be represented by identification information of the document, type information of the document, confidentiality level of the document, etc., the associated authority information may be set according to the relevant information of the document, for example, the authority information may be represented by a user type having a retrieval authority on the document, user level, etc., and the relevant information and the authority information of the document may be determined according to actual use requirements.

As another example, when a document is published within a target organization, rights information may be associated with itself, e.g., when a publisher performs document publication, it may be determined that the document may be viewed for information such as user type, user level, etc., i.e., the document itself has rights isolation properties. Correspondingly, the document needing to be saved to the knowledge base can be subjected to authority configuration by reading the self-associated authority information when the document is released.

From the above description, it can be known that the target document can be used as an input of an answer generation model, and an answer corresponding to the target question can be generated according to the model; or after matching the target document associated with the target question, the target document can be segmented, namely the target document can be input into a document segmentation model, knowledge blocks which are segmented from the target document and related to the target question are obtained, and the knowledge blocks are input into an answer generation sub-model to generate answers.

Optionally, in order to improve the matching degree between the model generation answer and the user search requirement, context information associated with the search request can be obtained, and the context information is taken as a constraint condition, so that the answer generation is performed. That is, the target document and the context information may be input to the answer generation model.

Wherein the context information may include at least one of the following information:

1. identity information associated by user under target organization

In the embodiment of the application, the identity information of the user can be represented as follows: identification information of the user, such as a mobile phone number, a mailbox, a job number and the like associated with the user at a target organization; professional information of the user, such as departments of the target organization in which the user is located; the user submits the position information of the search request; and the like, the embodiment of the application does not limit the user identity information, and can be determined according to actual use requirements.

It will be appreciated that the identity of the user may be identified based on the user identification information, thereby generating answers for the user that meet their preference requirements. When generating answers matched with the professional ability of the user according to the professional information of the user, for example, generating answers aiming at target questions of how the APP1 is installed, for the user engaged in research and development work, the user has stronger ability of installing the application program, so that more simplified answers can be generated; for users who do not work in research and development, more detailed answers can be generated, which are convenient for users to understand and operate. According to the location information, an answer matched with the current location of the user can be generated, for example, when an answer is generated for a target question of where a haircut shop is located in a park, the current park of the user can be determined from a plurality of parks associated with target organizations according to the current location of the user, then the haircut shop located in the park is located, and the corresponding answer is generated and pushed to the user.

2. Attribute information associated with target object aimed at by search request under target organization

In practical application, the target object may have different meanings inside the target organization and outside the target organization, that is, when the target object has a special meaning in the target organization, attribute information representing the special meaning may be obtained as context information, so as to constrain the answer generation process of the model and ensure the accuracy of the generated answer.

S203: and generating an answer corresponding to the target question according to the target document.

In practical application, a target document can be used as input of an answer generation model, and an answer corresponding to a target question generated by the model according to the target document is obtained; or, in order to improve the answer generation efficiency and reduce the answer generation difficulty, the document segmentation model may be used to segment the target document, segment the knowledge block related to the target question from the target document, and use the knowledge block as the input of the answer generation sub-model to obtain the answer corresponding to the target question generated by the sub-model according to the knowledge block.

As an example, the document segmentation may be performed on the target document according to paragraphs, adjacent paragraphs with relevance may be combined into one knowledge block, and then the scoring value of each knowledge block for the target problem may be obtained by calculation based on the similarity between the target problem and the target document and the similarity between the target problem and each knowledge block, and the knowledge block with the highest scoring value and the knowledge block with the scoring value not lower than the preset scoring value may be determined as the knowledge block related to the target problem.

In the description, the documents associated with the target organization can be stored in the knowledge base in a structured manner, and used as a retrieval data source when matching the target documents associated with the target problem. In practical application, the documents of the target organization have the characteristics of high updating speed, large updating quantity in unit time and the like, in order to improve the accuracy of answer generation, a database for storing newly added documents can be created besides searching based on the documents stored in the knowledge base, and the newly added documents stored in the database can not be subjected to structural processing from the aspects of data processing efficiency and processing cost.

In response thereto, after the target problem is obtained, the knowledge base and the database may be used as a retrieval data source from which documents associated with the target problem are matched. Specifically, when it is determined that the target question is associated with a new document, that is, when a document matching the target question is retrieved from the database, the new document and the target document retrieved from the knowledge base may be input into the answer generation model, so that the model generates an answer corresponding to the target question according to the new document and the target document.

It can be appreciated that, for the newly added document stored in the database, the newly added document can be structured according to the use requirement and updated to the knowledge base. For example, the search frequency of the problems associated with the newly added document can be counted, and the knowledge base update processing is performed on the newly added document with the search frequency not lower than the preset frequency; or, the knowledge base updating process can be performed on the newly-added document which needs to be updated immediately according to the importance degree of the newly-added document; etc. The embodiment of the application does not limit the newly added document which can be updated to the knowledge base.

In addition, in practical application, when the newly added document is determined to exist, the knowledge base is updated, so that the relative comprehensiveness of the documents stored in the knowledge base is ensured. For example, real-time updates may be made as the newly added document is published; or the update processing may be performed on the newly added document when the number of users using the search question-answering service is small, for example, between 0 and 4 am, which is not limited in the embodiment of the present application.

In summary, compared with the prior art that the user needs to explicitly define a plurality of platforms associated with the target organization and document types and corresponding login entries stored by different platforms when searching questions and answers are performed, the embodiment of the application can provide centralized question consultation service for the user through a unique access entry, thereby simplifying the operation link of the user for searching questions and answers and improving the searching questions and answers efficiency. In addition, compared with the prior art that only documents associated with target questions can be provided for users, manual review is needed for users aiming at finer granularity query, the method and the device can generate targeted answers aiming at the target questions, answers meeting search requirements are provided for users directly, and search question and answer experience of the users is optimized.

As an example, in order to improve the processing efficiency of the search question-answering service, the following preferable processing may be performed:

1. storing answers generated by the question-answer model aiming at the target questions

Specifically, an answer generated by a question-answer model for a target question may be obtained, the answer being generated by the question-answer model from target documents associated with the target question that are matched from a knowledge base associated with a target organization; and establishing an association relation between the target question and the answer so as to provide the answer associated with the target question when a search request aiming at the target question is obtained.

The answer that can be saved can be determined according to actual use requirements, for example, can be determined according to the search frequency of the target problem, the importance degree of the target problem, and the like. That is, in the embodiment of the application, the association relationship between the target question and the answer can be established in a FAQ mode, and the answer generated by the model is solidified. Therefore, when a search request submitted by a user aiming at a target problem is obtained, FAQ inquiry can be firstly carried out, and if a corresponding answer is inquired, the answer can be directly provided for the user, so that the search question-answering efficiency is improved. If no corresponding answer is queried, the corresponding answer can be generated according to the scheme provided by the embodiment of the application and then provided for the user.

Optionally, the present exemplary embodiment may also implement a rights management function. Specifically, when the answer corresponding to the target question is queried, whether the user has the authority to view the corresponding answer or not can be determined according to the authority information associated with the user under the target organization, so that authority management and control are performed, and the search safety of the documents in the target organization is ensured.

2. Storing knowledge blocks related to target problems

Specifically, knowledge blocks related to target problems, which are segmented from target documents by a document segmentation model, can be obtained, and the target documents are matched from a knowledge base associated with target organizations by a document retrieval model according to the target problems; and establishing an association relation between the target question and the knowledge block, and taking the knowledge block associated with the target question as input of an answer generation model for generating an answer corresponding to the target question when a search request for the target question is obtained.

The storable knowledge block may be determined according to actual use requirements, for example, may be determined according to a search frequency of the target problem, an importance degree of the target problem, and the like. That is, the embodiment of the application can establish the association relation between the target problem and the corresponding knowledge block, and cure the association relation. Therefore, when a search request submitted by a user aiming at the target problem is obtained, whether a knowledge block associated with the target problem is stored or not can be queried and determined, if so, the knowledge block can be input into an answer generation model for generating an answer corresponding to the target problem, and the search question-answering efficiency is improved. If not, the answer generation can be performed according to the scheme provided by the embodiment of the application.

Optionally, the present exemplary embodiment may also implement a rights management function. Specifically, when the knowledge block associated with the target problem is determined to be stored, whether the user has the authority of the knowledge block (i.e. whether the user has the authority to view the answer generated based on the knowledge block) can be determined according to the authority information associated with the user under the target organization, so that authority management and control can be performed, and the search security of the document in the target organization can be ensured.

As an example, the search question and answer function described above may be implemented by a smart dialog system, i.e., the client may be embodied as a smart dialog client. Specifically, a session between the intelligent dialogue client and the question-answer model can be established through the intelligent dialogue client, and a session interface is provided; obtaining a search request submitted by a user in a dialogue mode through the session interface, and sending the search request to the question-answer model so that the question-answer model obtains a target problem to be consulted, which is determined according to the search request, and matching a target document associated with the target problem from a knowledge base associated with a target organization to which the user belongs, and generating an answer corresponding to the target problem according to the target document; and obtaining an answer corresponding to the target question, and displaying the answer on the session interface.

That is, the user can submit the search request in the session interface in a dialogue chat mode, so as to trigger the question-answering model to provide the search question-answering service, and can also view the answer generated by the model aiming at the target problem in the session interface, thereby being beneficial to improving the interactive experience of the user.

As an example, at least two target organizations that are trusted with each other may be bound according to usage requirements, so as to implement document sharing between the two associated knowledge bases. Specifically, at least two target organizations with binding relations can be obtained, and the knowledge bases associated with the target organizations are associated; when obtaining search requests submitted by users belonging to the at least two target organizations, document matching is carried out from the knowledge base associated with the at least two target organizations, and a question-answer model is input for answer generation.

The embodiment of the application does not limit the types of at least two target organizations with binding relationship, takes the target organization as an enterprise as an example, can bind subsidiary companies belonging to the same main body or can bind enterprises with cooperative relationship, and can be determined according to the use requirement.

Thus, when the search request submitted by the user is obtained and the user is determined to belong to at least two target organizations, document retrieval can be performed in the associated knowledge base, the retrieval range is enlarged, and document sharing between the knowledge bases associated with the at least two target organizations is realized. The process of answer generation by question-answer model is described above and is not illustrated here.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Corresponding to the foregoing method embodiment, the embodiment of the present application further provides a question-answer model building apparatus, referring to fig. 4, where the apparatus may include:

a training sample obtaining unit 401, configured to obtain a training sample, where the training sample includes a plurality of sample questions, and sample documents and sample answers respectively associated with different sample questions, where the sample documents belong to a target organization;

An initial model obtaining unit 402 for obtaining an initial model for generating an answer, the initial model including a document retrieval model and an answer generation model, and an output of the document retrieval model being an input of the answer generation model;

and the model training unit 403 is configured to perform model training on the document retrieval model through the sample documents respectively associated with the different sample questions, and perform model training on the answer generation model through the sample documents respectively associated with the different sample questions and the sample answers, so as to obtain a question-answer model, so that the question-answer model has the capability of retrieving the documents associated with the questions and the capability of generating answers corresponding to the questions according to the retrieved documents.

Wherein the training sample further comprises: sample knowledge blocks respectively associated with different sample questions are obtained by segmentation from sample documents associated with the sample questions, and the answer generation model comprises: the document segmentation model and the answer generation sub-model,

the model training unit is specifically applicable to: and carrying out model training on the document segmentation model through the sample documents respectively associated with the different sample questions and the sample knowledge blocks cut from the sample documents, and carrying out model training on the answer generation sub-model through the sample knowledge blocks respectively associated with the different sample questions and the sample answers to obtain the question-answer model, so that the question-answer model has the capability of cutting the knowledge blocks from the retrieved documents and the capability of generating answers corresponding to the questions according to the cut knowledge blocks.

Corresponding to the foregoing method embodiment, the embodiment of the present application further provides a knowledge base creation apparatus, referring to fig. 5, where the apparatus may include:

a search engine construction unit 501, configured to create a knowledge base associated with a target organization, and construct a keyword search engine and a vector search engine of the knowledge base;

an index creating unit 502, configured to obtain a document associated with the target organization, save the document to the knowledge base, and create a keyword index and a vector index of the document.

Corresponding to the foregoing method embodiment, the embodiment of the present application further provides a search question-answering device, referring to fig. 6, where the device may include:

a target question obtaining unit 601, configured to obtain a target question to be consulted determined according to a search request submitted by a user when the question-answering model accepts a call;

a target document matching unit 602, configured to match, from a knowledge base associated with a target organization to which the user belongs, a target document associated with the target question:

and an answer generating unit 603, configured to generate an answer corresponding to the target question according to the target document.

The objective problem obtaining unit is specifically configured to: determining a target object aimed by the search request; the target problems are determined from problems associated with the target objects under the target organization, and the problems associated with the target objects are determined when the question-answering model is trained.

The target document matching unit is specifically configured to: matching a first document associated with the target problem in the knowledge base through a keyword search engine of the knowledge base, and matching a second document associated with the target problem in the knowledge base through a vector search engine of the knowledge base; the target document is determined from the first document and the second document.

Wherein the apparatus further comprises:

and the right information obtaining unit is used for obtaining the right information associated by the user under the target organization so as to determine a target document matched with the right information from the knowledge base.

The answer generation unit is specifically configured to: and cutting out a knowledge block related to the target problem from the target document, and generating an answer corresponding to the target problem according to the knowledge block.

Wherein the apparatus further comprises:

a context information obtaining unit, configured to obtain context information associated with the search request, where the context information includes at least one of the following information: identity information associated with the user under the target organization and attribute information associated with the target object aimed at by the search request under the target organization;

The answer generation unit is specifically configured to: and generating the answer by taking the context information as a constraint condition.

Wherein the apparatus further comprises:

a new document obtaining unit, configured to obtain a new document associated with the target problem;

the answer generation unit is specifically configured to: and generating the answer according to the newly added document and the target document.

Corresponding to the foregoing method embodiment, the embodiment of the present application further provides a search question-answering device, where the device may include:

In addition, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method of any one of the previous method embodiments.

And an electronic device comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

In which fig. 7 illustrates an architecture of an electronic device, for example, device 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, an aircraft, and so forth.

Referring to fig. 7, device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the methods provided by the disclosed subject matter. Further, the processing component 702 can include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

Memory 704 is configured to store various types of data to support operations at device 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and the like. The memory 704 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 706 provides power to the various components of the device 700. Power supply components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 700.

The multimedia component 708 includes a screen between the device 700 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 708 includes a front-facing camera and/or a rear-facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 700 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive external audio signals when the device 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 704 or transmitted via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals.

Input/output (I/O) interface 712 provides an interface between processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the device 700. For example, the sensor assembly 714 may detect an on/off state of the device 700, a relative positioning of the components, such as a display and keypad of the device 700, a change in position of the device 700 or a component of the device 700, the presence or absence of user contact with the device 700, an orientation or acceleration/deceleration of the device 700, and a change in temperature of the device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate communication between the device 700 and other devices, either wired or wireless. The device 700 may access a wireless network based on a communication standard, such as WiFi, or a mobile communication network of 2G, 3G, 4G/LTE, 5G, etc. In one exemplary embodiment, the communication component 716 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 704 including instructions executable by processor 720 of device 700 to perform the methods provided by the disclosed subject matter. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the principles and embodiments of the application may be better understood, and in order that the present application may be better understood; also, it is within the scope of the present application to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. The method for constructing the question-answering model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the training sample further comprises: sample knowledge blocks respectively associated with different sample problems, wherein the sample knowledge blocks are obtained by segmentation from sample documents associated with the sample problems, and then

3. A knowledge base creation method, comprising:

4. A search question and answer method, comprising:

5. The method of claim 4, wherein the obtaining the target questions to be consulted determined from the search request submitted by the user comprises:

determining a target object aimed by the search request;

6. The method of claim 4, wherein said matching the target document associated with the target question from the knowledge base associated with the target organization to which the user belongs comprises:

7. The method as recited in claim 6, further comprising:

8. The method of claim 4, wherein generating an answer corresponding to the target question from the target document comprises:

9. The method according to any one of claims 4 to 8, further comprising:

10. The method according to any one of claims 4 to 8, further comprising:

obtaining a new document associated with the target problem;

11. A search question and answer method, comprising:

obtaining a search request submitted by a user in a dialogue mode through the session interface, and sending the search request to the question-answer model so that the question-answer model obtains a target problem to be consulted, which is determined according to the search request, and matching a target document associated with the target problem from a knowledge base associated with a target organization to which the user belongs, and generating an answer corresponding to the target problem according to the target document:

12. A search question and answer method, comprising:

13. A search question and answer method, comprising:

14. An electronic device, comprising:

one or more processors; and

a memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 13.