CN114528381A

CN114528381A - Question-answer recognition method and related equipment

Info

Publication number: CN114528381A
Application number: CN202011199067.4A
Authority: CN
Inventors: 胡啸; 李祺欣; 邓东; 车万翔; 刘挺
Original assignee: Huawei Technologies Co Ltd; Harbin Institute of Technology
Current assignee: Huawei Technologies Co Ltd; Harbin Institute of Technology
Priority date: 2020-10-31
Filing date: 2020-10-31
Publication date: 2022-05-24

Abstract

The application provides a question and answer identification method and related equipment. Firstly, a target question-answer pair is obtained, then the target question-answer pair is input into a target question-answer model, and a loss function of the target question-answer model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module. And finally, outputting whether the answer base comprises the target answer corresponding to the target question or not. Therefore, the target question-answering model combines the characteristics of the classification module, the answer extraction module and the answer identification module when answering questions, and improves the accuracy of dialogue question-answering by cooperating with semantics of different granularities.

Description

Question-answer recognition method and related equipment

Technical Field

The embodiment of the application relates to the field of computers, and further relates to application of an Artificial Intelligence (AI) technology in the field of computers, in particular to a question and answer identification method and related equipment.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

In the scenes, aiming at the input of a user (question sentences asked by the user), the question-answering system needs to find answers corresponding to the question sentences in an answer library and return the answers to the user, so that the service of the user for obtaining information is met. For example, in the after-sale maintenance intelligent customer service of the mobile phone, the question and answer system can find a method for maintaining the mobile phone fault of the user according to the questions of the user. In a dialog system, the question-answering system can find a memo event that the user has actively remembered by the system before according to the user's question. In encyclopedias, the question-answering system can find articles in encyclopedias that can answer the user's questions based on the user's questions. Therefore, the question-answering system has become a very important part of a new information interaction mode.

In the practical application process, firstly, multiple situations such as high semantic correlation, more semantic correlation and irrelevant semantic may exist among answers in the question-answering library, which requires that the question-answering system can effectively process data with different semantic correlations. Second, the question asked by the user may be very fine or very coarse, which also requires the question-and-answer system to understand questions of varying degrees of thickness. Finally, the content of the user inquiry is domain-opening and not limited, which requires the question-answering system to better handle the situation of semantic conflict between different domains. In summary, in a real application scenario, the dialogue question-answer data is more complex from the question to the answer base, and how to effectively process the data is a puzzling question.

Disclosure of Invention

The embodiment of the application provides a question-answer recognition method and related equipment, and a target question-answer model provided in the application combines the characteristics of a classification module, an answer extraction module and an answer recognition module when answering a question, and improves the accuracy rate of dialogue question-answer through cooperating semantics with different granularities.

A first aspect of an embodiment of the present application provides a question and answer recognition method, including: acquiring a target question-answer pair, wherein the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question; inputting the target question-answer pair into a target question-answer model, wherein a loss function of the target question-answer model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module; and whether a target answer corresponding to the target question is included in an output answer library or not is output, wherein the target answer is used for indicating a correct answer corresponding to the target question.

In the method, firstly, a target question-answer pair is obtained, then, the target question-answer pair is input into a target question-answer model, and a loss function of the target question-answer model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module. And finally, outputting whether the answer base comprises the target answer corresponding to the target question or not. Therefore, the target question-answering model combines the characteristics of the classification module, the answer extraction module and the answer identification module when answering questions, and improves the accuracy of dialogue question-answering by cooperating with semantics of different granularities.

Firstly, a first training sample data set is obtained, then an initial question-answering model is trained according to the sample data set to obtain a target question-answering model, and a loss function of the target question-answering model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module. Therefore, when the target question-answering model answers the questions, the characteristics of the classification module, the answer extraction module and the answer identification module are combined, and the accuracy rate of the dialogue question-answering is improved by cooperating with the semantics of different granularities.

In a possible implementation manner of the first aspect, the method further includes: acquiring a target problem; preprocessing the target problem; the obtaining of the target question-answer pair comprises the following steps: and acquiring the target question-answer pair according to the preprocessed target question.

In the question and answer recognition method provided by the embodiment of the application, after the target question is acquired, the questions collected from the user side can be subjected to basic preprocessing in various manners. Optionally, the preprocessing mode may be stop word deletion. Since the stop words are most of some nonsense words such as conjunctions and the like, the stop words have negative effects on subsequent keyword extraction, alternative answer pre-screening and the like, and therefore, the stop words such as 'ground', 'me' and the like are removed to facilitate the subsequent model to process the target question. Alternatively, the preprocessing may be word segmentation. The word segmentation mainly formalizes the user question, highlights keyword information which can represent the semantics of the user question, and the keywords have positive effect on the pre-screening of alternative answers. Optionally, the preprocessing may be problem type determination. Chinese questioning sentences are usually complex and have various question types. According to different application scenes, different question types can be supported, and meanwhile, the question types can also have certain influence on answer output. Therefore, the type judgment of the problems collected from the user side can be firstly carried out, and the subsequent model processing target problems is facilitated.

In a possible implementation manner of the first aspect, the method further includes: acquiring a keyword in the target problem; screening the alternative answers according to the keywords; forming the target question and the screened qualified alternative answers into the target question-answer pair; inputting the target question-answer pairs into a question-answer model, wherein the method comprises the following steps: and forming the target question-answer pair by the target question and the screened qualified alternative answers and inputting the target question-answer pair into the question-answer model.

The question and answer recognition method provided in the embodiment of the application can be used for preprocessing the alternative answers. The alternative answer preprocessing is mainly performed under the condition that the number of alternative answers is large when a user asks questions. For example, in the scenario of intelligent customer service, since the user encounters different types of questions, each type of questions will be subdivided into many different questions and corresponding solutions, when the user asks a question, the answer space is large, and it is difficult to screen the answer. At this time, we can preprocess the alternative answers, and increase the probability that the answer is successfully matched with the question.

A second aspect of the embodiments of the present application provides a question and answer recognition method, including: acquiring a training sample data set, wherein the training sample data set comprises at least one target question-answer pair, and the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question; and training the initial question-answering model according to the training sample data set to obtain a target question-answering model, wherein a loss function of the target question-answering model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module.

In the method, firstly, a first training sample data set is obtained, then, the initial question-answering model is trained according to the sample data set to obtain a target question-answering model, and a loss function of the target question-answering model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module. Therefore, when the target question-answering model answers the questions, the characteristics of the classification module, the answer extraction module and the answer identification module are combined, and the accuracy rate of the dialogue question-answering is improved by cooperating with the semantics of different granularities.

In a possible implementation manner of the second aspect, the training sample data set includes a first sample data set, and the method further includes: and training an initial answer recognition module according to the first sample data set to obtain the answer recognition module.

In a possible implementation manner of the second aspect, training an initial answer recognition module according to the first sample data set to obtain the answer modeling module includes: acquiring a first semantic hidden representation according to the first sample data set, wherein the first semantic hidden representation is a semantic vector obtained by the answer recognition module according to the first sample data set; cooperatively processing the first semantic implicit representation, the second semantic implicit representation and a third semantic implicit representation through a Factor (FM) to obtain a third loss function, wherein the second semantic implicit representation is a semantic vector obtained by the classification module according to the first sample data set, and the third semantic implicit representation is a semantic vector obtained by the answer extraction module according to the first sample data set; and obtaining the answer identification module according to the third loss function.

In the possible implementation mode, in order to solve a complex question-answer dialogue scene and coordinate multi-granularity semantic association, an answer recognition module plays the most important role, in the application, the multi-granularity semantic is fused in the module based on FM, FM is short for a factor decomposition machine, the semantic association characteristic among multiple factors can be considered, and in a PW layer, the semantics of a DC module and a QA module are represented in a hidden mode

And

the calculation mode not only considers the information of the coarse granularity of field identification, but also considers the fine granularity of information extracted by QA answers.

In a possible implementation manner of the second aspect, the training sample data set further includes a second sample data set, and the method further includes: training an initial classification module according to the second sample data set to obtain the first loss function; and obtaining the classification module according to the first loss function.

In this possible implementation, in the classification module, h_aAnd h_qIn the module, the user wants to judge whether the question and the answer are in the same field, and if the alternative answer and the question are not in the same field, the user can easily judge that the alternative answer is not the answer of the question possibly. The DC module is a classification module, which is implemented by a neural network. And inputting the data in the second sample data set into the DC module to train the initial DC module, further adjusting the loss function of the DC module according to the result of multiple training, and finally obtaining the loss function of the DC module after multiple adjustment, wherein the loss function is called as a first loss function.

In a possible implementation manner of the second aspect, the training sample data set further includes a third sample data set, and the method further includes: training an initial classification module according to the third sample data set to obtain the second loss function; and obtaining the answer extraction module according to the second loss function.

A third aspect of embodiments of the present application provides a question-answering identifying apparatus, including a processor and a computer-readable storage medium storing a computer program; the processor is coupled with the computer-readable storage medium, and the computer program, when executed by the processor, causes the question and answer recognition apparatus to perform the method as described in the first aspect above or any one of the possible implementations of the first aspect.

A fourth aspect of an embodiment of the present application provides a training apparatus, including a processor and a computer-readable storage medium storing a computer program; the processor is coupled with the computer-readable storage medium, and the computer program, when executed by the processor, causes the question-and-answer recognition apparatus to perform the method as described in the second aspect or any one of the possible implementations of the second aspect.

A fifth aspect of embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in the foregoing first aspect or any one of the possible implementations of the first aspect, or where the computer program, when executed by a processor, implements the method as described in the foregoing second aspect or any one of the possible implementations of the second aspect.

A sixth aspect of embodiments of the present application is a chip system, where the chip system includes a processor, and the processor is called to execute a method described in any one of the above-mentioned first aspect or any one of the above-mentioned possible implementation manners of the first aspect, or the processor is called to execute a method described in any one of the above-mentioned second aspect or any one of the above-mentioned possible implementation manners of the second aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

in the method, firstly, a first training sample data set is obtained, then, the initial question-answering model is trained according to the sample data set to obtain a target question-answering model, and a loss function of the target question-answering model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module. Therefore, when the target question-answering model answers the questions, the characteristics of the classification module, the answer extraction module and the answer identification module are combined, and the accuracy of dialogue question-answering is improved by coordinating semantics with different granularities.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence body framework provided by the present application;

FIG. 2 is a schematic diagram of an embodiment of a question answering identification method provided in the present application;

FIG. 3 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

FIG. 4 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

FIG. 5 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

FIG. 6 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

FIG. 7 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

FIG. 8 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

fig. 9 is a schematic diagram of another embodiment of the question answering identification method provided in the present application;

FIG. 10 is a schematic diagram of another embodiment of a question answering identification method provided in the present application;

fig. 11 is a schematic structural diagram of a question-answer identifying apparatus provided in the present application;

FIG. 12 is a schematic diagram of a configuration of a model training apparatus provided herein;

fig. 13 is another schematic structural diagram of the question answering identification apparatus provided in the present application;

fig. 14 is another schematic structural diagram of the model training apparatus provided in the present application.

Detailed Description

Embodiments of the present application will now be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely illustrative of some, but not all, embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Artificial Intelligence (AI) is a comprehensive technique in computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In either direction of artificial intelligence, the network model is involved, for example: deep Neural Network (DNN) models, Convolutional Neural Network (CNN) models, and the like. And training the initial network model by using the marking data of different service scenes to obtain the target network model suitable for the service scene. For example, common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like, with the development of technologies, the artificial intelligence technology will be applied in more fields and exert more and more important values.

The dialogue question-answer scene is an important research direction in the field of interactive artificial intelligence, and for a given question, the user hopes to accurately find an answer from an answer library and return the answer to the user; in the conventional question-answering processing method, QA (question answer) based and pipeline (pipeline) based answers can effectively extract answers wanted by users from a question-answering data set, but the data processed by the users are generally related to the semanteme of question sentences or answers at the same level; in the practical application process, the question-answer library is more complex than a data set processed by a conventional method; firstly, multiple conditions of high semantic correlation, relatively correlated semantics, irrelevant semantics and the like may exist among answers in a question-answering library, which requires a question-answering system to effectively process data with different semantic correlations; secondly, the question asked by the user may be very thin or thick, which also requires the question-answering system to understand the question with different thickness degrees; finally, the content of the user inquiry is domain-opening and not limited, which requires the question-answering system to better handle the situation of semantic conflict between different domains.

The existing question-answering system is usually a question-answering system based on an incidence relation, and in the system, the generation of alternative answers is mainly based on the incidence relation between a user question and a question bank or between the user question and an answer bank; under the condition of calculating the incidence relation between a user question and a question bank, a question-answering system needs to prepare questions and answers which can be asked by the user in advance, and when the user asks, the user only needs to find the most relevant question-answer pair and return the most relevant question-answer pair to the user; under the condition of calculating the incidence relation between the user question and the answer library, the question-answering system does not need to store the questions which the user may ask, but needs to be modeled to acquire alternative answers from the answer library, namely, the answer extraction method based on semantic matching. However, the current answer extraction method cannot efficiently solve the question and answer problem in the complex dialogue scene.

The existing question-answering system has some defects, firstly, the method based on keywords or probability statistics cannot fully consider the complicated and abstract semantic information between texts, and the method for expanding the keywords is too dependent on an external dictionary base. When the question answering system migrates among different fields, the workload is large. In addition, the end-to-end answer extraction method based on the deep neural network has poor accuracy, because given the answer text, the exact extraction of the answer word string of the question is influenced by many factors, including the boundary identification of the answer text string, the discontinuous answer text string, and the like. The pipeline (pipeline) method has the situation of error accumulation, and the error of the upstream pre-screening can greatly affect the result of the downstream answer extraction. Meanwhile, under a complex dialogue question-answer scene, both end-to-end-based answer extraction and pipeline-based methods have respective disadvantages, and the problem can be better solved only by fully considering semantic collaboration of problem spaces with different granularity and answer spaces with different semantic correlations.

In conclusion, in a real application scene, dialogue question-answer data from question sentences to an answer base are more complex, and how to effectively process the data is a puzzling question.

In order to solve the above problems, the present application provides a model training method and related devices, and first, a training sample data set is obtained. Then, training the initial question-answer model according to the training sample data set to obtain a target question-answer model, wherein a loss function of the target question-answer model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module and a third loss function of the answer identification module. Therefore, the target question-answering model combines the characteristics of the classification module, the answer extraction module and the answer identification module when answering questions, not only can comprehensively consider the semantic relevance among different answers in the answer library, but also can consider the thickness degree of the question questions, and can be better applied to the scene of complicated dialogue question-answering and improve the accuracy of dialogue question-answering by cooperating with the semantics with different granularities.

FIG. 1 shows a schematic diagram of an artificial intelligence body framework that describes the overall workflow of an artificial intelligence system, applicable to the general artificial intelligence field requirements.

The artificial intelligence topic framework described above is set forth below in terms of two dimensions, the "intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).

The "smart information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process.

The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to smart chips in a distributed computing system provided by the underlying platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General purpose capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

The question-answer recognition method provided by the application is described based on the main framework schematic diagram of artificial intelligence described in fig. 1.

Fig. 2 is a schematic diagram of an embodiment of a question answering identification method provided in the present application.

Referring to fig. 2, an embodiment of a question answering identification method provided in the present application includes steps 201 to 203.

201. And acquiring a target question-answer pair.

In the embodiment of the application, the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question.

202. And inputting the target question-answer pairs into a target question-answer model.

In the embodiment of the present application, the loss function of the target question-answering model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module, and a third loss function of the answer identification module.

203. And outputting whether the target answer corresponding to the target question is included in the answer library.

In the embodiment of the present application, the target answer is used to indicate a correct answer corresponding to the target question.

Fig. 3 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 3, in the embodiment of the present application, in addition to the above steps 201 to 203, the question answering identification method provided by the present application further executes

steps

301 and 302, and a specific implementation manner will be described in detail in the following embodiment.

301. And acquiring a target problem.

302. And preprocessing the target problem.

In the question and answer recognition method provided by the embodiment of the application, after the target question is acquired, the questions collected from the user side can be subjected to basic preprocessing in various manners.

Optionally, the preprocessing mode may be stop word deletion. Since the stop words are most of some nonsense words such as conjunctions and the like, the stop words have negative effects on subsequent keyword extraction, alternative answer pre-screening and the like, and therefore, the stop words such as 'ground', 'me' and the like are removed to facilitate the subsequent model to process the target question.

Alternatively, the preprocessing may be word segmentation. The word segmentation mainly formalizes the user question, highlights keyword information which can represent the semantics of the user question, and the keywords have positive effect on the pre-screening of alternative answers.

Optionally, the preprocessing may be problem type determination. Chinese questioning sentences are usually complex and have various question types. According to different application scenes, different question types can be supported, and meanwhile, the question types can also have certain influence on answer output. Therefore, the type judgment of the problems collected from the user side can be firstly carried out, and the subsequent model processing target problems is facilitated.

In this embodiment of the present application, optionally, the problem preprocessing may be to process the target problem by using one of the above manners, or may be to process the target problem by using multiple manners. Optionally, the problem preprocessing may also be to process the target problem in other ways, and is not limited herein.

Fig. 4 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 4, in the embodiment of the present application, in addition to the above steps 201 to 203, the question answering identification method provided by the present application further executes steps 401 to 403, and a specific implementation manner will be described in detail in the following embodiment.

401. And acquiring keywords in the target question.

In the embodiment of the application, firstly, a keyword set included in a target question is obtained by using the target question through question preprocessing. Similarly, the answer can also store the information after word segmentation while storing the complete sentence; it should be noted that any introduction of external knowledge, such as a dictionary, may extend the set of keywords by a set of synonyms.

402. And screening alternative answers according to the keywords.

In the embodiment of the application, the similarity of the keyword hot-one expression (one-hot) of the question and the answer can be calculated, the answers with low similarity are filtered, and the rest answers are used as alternative answers of the user question to perform subsequent answer extraction.

403. And forming a target question-answer pair by the target question and the screened qualified alternative answers.

The model training method provided by the present application is described based on the schematic diagram of the main framework of artificial intelligence described in fig. 1.

Fig. 5 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 5, an embodiment of a question answering identification method provided in the present application includes steps 501 to 502.

501. And acquiring a training sample data set.

In the embodiment of the application, the training sample data set comprises at least one target question-answer pair, and the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question.

502. And training the initial question-answering model according to the training sample data set to obtain a target question-answering model.

In the embodiment of the present application, the output layer of the target question-answering model may comprehensively consider the loss functions of the DC, PW, and QA modules, and of course, if the cost of the labeled data amount is considered, the DC and QA modules may be pre-trained by partial data. Optionally, while performing PW training, the QA module and the DC module may or may not perform fine adjustment on the data condition. In an alternative embodiment, the loss function of the target question-answering model has the following relationship:

L＝αl_pw+βl_qa+(1-α-β)l_dc

in the above formula, α and β are hyper-parameters, and the range thereof is 0 to 1, and need to be manually preset.

Fig. 6 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 6, in the embodiment of the present application, an overall framework of the model training method is shown in fig. 6, and there are 5 layers in the framework, which are an input layer, a hidden layer, a multi-granularity semantic layer, an FM semantic cooperation layer, and a prediction layer.

In the embodiment of the application, in an input layer, a represents an alternative answer obtained after the alternative answer is preprocessed, q is a target question proposed by a user, the target question is matched with the alternative answer to form an input of a target question-answer model, and the inputs a and q in the target question-answer model are not results of representation of keywords but target question-answer pairs formed by an original complete whole question and answer.

In the embodiment of the present application, the hidden layer is a deep neural network constructed by a Long Short-Term Memory network (LSTM), and the main purpose of the hidden layer is to represent question sentences and alternative answers as dense vectors, and this semantic representation method is widely adopted in recent years with the rise of the deep neural network, and we represent them as follows:

h_a＝LSTM(a_{1，...，|a|})

h_q＝LSTM(q_{1，...，|q|})

in the above formula, a_{1，...，|a|}Is the word string representation of the problem, | a | is the length of the problem word string; likewise, q_{1，...，|q|}Is the string representation of the alternative answer, | q | is the length of the alternative answer; after passing through the LSTM neural network computing module, the question a and the alternative answer q are respectively expressed as a low-dimensional dense semantic vector h_aAnd h_q。

In the embodiment of the application, after a target Question-answer Pair is processed by an input layer and a hidden layer, the semantic representation of a target Question and an alternative answer is used as the input of a multi-granularity semantic layer, the multi-granularity semantic layer models a plurality of granularities of the Question-answer, and the multi-granularity semantic layer respectively comprises a Classification (DC) module for distinguishing whether the Question and the alternative answer are in the same field, an answer recognition (Pair-Wise, PW) module for judging whether the alternative answer contains an answer required by the Question, and an answer extraction (QA) module for extracting a Question answer string from an alternative answer text; the three modules are designed mainly for processing the characteristics of uneven thickness of questions and different semantic relevance degrees among answers of users in complex question and answer data. In order to reply to the user efficiently, the target question-answering model provided by the embodiment of the application comprehensively utilizes the capabilities of the three modules, rather than independently utilizing the capability of a certain granularity to answer the question as in the existing method.

Fig. 7 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 7, in the embodiment of the present application, in addition to the above steps 501 to 502, the model training method provided by the present application may further train the initial classification module according to the second sample data set to obtain the first loss function, and further obtain the classification module according to the first loss function, where this specific implementation manner will be described in detail in the following embodiments.

In the embodiment of the present application, the network structure of the DC module is shown in fig. 7, h_aAnd h_qIn the module, the user wants to judge whether the question and the answer are in the same field, and if the alternative answer and the question are not in the same field, the user can easily judge that the alternative answer is not the answer of the question possibly. The DC module is a classification module, which is implemented by a neural network. And inputting the data in the second sample data set into the DC module to train the initial DC module, further adjusting the loss function of the DC module according to the result of multiple training, and finally obtaining the loss function of the DC module after multiple adjustment, wherein the loss function is called as a first loss function.

Fig. 8 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 8, in the embodiment of the present application, in addition to the above steps 501 to 502, the model training method provided by the present application may further train the initial classification module according to a third sample data set to obtain a second loss function, and obtain an answer extraction module according to the second loss function.

In the embodiment of the present application, the network structure of the QA module is as shown in fig. 8, which is a QA model implemented by a deep neural network method, and after data in the third sample set input by the hidden layer is input, the hidden layer processes the data and outputs h_aAnd h_q. We also concatenate the two implicit representations as inputs to the subsequent operation:

in the above formula, [ h ]_a，h_q]Representing a concatenation of two low-dimensional dense vectors,

represents the activation function of the intermediate deep neural network of the QA module,

representing the weights of the neural network. Alternatively, the QA module implementation herein may be implemented using other feed-forward neural network architectures. Obtaining semantic fused representation of question and answer

Then, we can add a neural network to predict which string in the answer a is the answer corresponding to the question and answer, and calculate as follows:

in the above-mentioned formula, the compound of formula,

is a matrix of length | a | and width 2, the firstDimension for determining the start of the answer string in the candidate answer text, second dimension for determining the end of the answer string in the candidate answer text, f₂And W₂Respectively, the activation function and the weight of the current neural network structure.

Fig. 9 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

Referring to fig. 9, in the embodiment of the present application, in addition to the above steps 501 to 502, the initial answer recognition module may be trained according to the first sample data set to obtain the answer recognition module. One embodiment of the model training method provided in the present application includes steps 601 to 603, and a specific implementation manner will be described in the following embodiments.

Fig. 10 is a schematic diagram of another embodiment of the question answering identification method provided in the present application.

601. A first semantically hidden representation is obtained from the first sample dataset.

Referring to fig. 10, in the embodiment of the present application, the first semantic hidden is represented as a semantic vector obtained by the answer recognition module according to the first sample data set. After inputting the data in the first sample set input by the layer, the hidden layer processes the data and outputs h_aAnd h_qH of output_aAnd h_qI.e. a first semantic hidden representation.

602. And cooperatively processing the first semantic implicit representation, the second semantic implicit representation and the third semantic implicit representation through a factor decomposition Machine (FM) to obtain a third loss function.

In the embodiment of the application, the second semantic implicit expression is a semantic vector obtained by the classification module according to the first sample data set, and the third semantic implicit expression is a semantic vector obtained by the answer extraction module according to the first sample data set. The process of obtaining the semantic hidden representation by each module is similar to the step 601, and details are not described here.

In the embodiment of the application, in order to solve a complex question-answering dialogue scene and achieve collaborative multi-granularity semantic association, the PW module plays the most important role, and in the application, a plurality of particles are fused in the module based on FMDegree semantics, FM is short for a factorization machine, semantic association characteristics among multiple factors can be considered, and in a PW layer, semantics of a DC module and a QA module are represented in a hidden mode

And

the network structure of the PW module is as shown in fig. 5, and the calculation method is as follows:

in the above formula, h₁Is a series connection which is implicitly represented by three modules of DC, PW and QA,

the method is characterized in that the semantic associated information of DC, PW and QA is cooperated through an FM (factor decomposition machine) to judge whether the current answer is matched with the question or not, and the calculation mode not only considers the information of coarse granularity of field identification, but also considers the information of fine granularity of QA answer extraction.

603. And obtaining an answer identification module according to the third loss function.

In this embodiment of the application, after obtaining the pw module according to the third loss function, we may perform the following formatting steps based on the answer output module, and then return to the user: (1) the change of the human scale and the time is carried out according to the characteristics of the field. In some specific scenarios, such as memo voice interaction, the user may greatly reduce the user experience if the user does not process the retrieved content due to the change of time and name. For example, to help me remember, me am afternoon about Xiaoming meal; if the user returns directly when asking for this record the next day, it is not good, but the person title and time should be changed to the correct form, such as "you tomorrow at lunch. (2) Besides the answer of the given question can be judged by the question-answering module based on FM multi-granularity semantic collaboration, the QA module can also judge the answer string corresponding to the question, so that the information can be identified for the user in a proper mode according to the field requirement, and the user experience is improved.

The foregoing embodiment provides different implementation manners in the question and answer identification method and the model training method, and a question and answer identification device 70 is provided below, as shown in fig. 11, where the question and answer identification device 70 is configured to execute steps executed by the question and answer identification device in the foregoing embodiment, and the execution steps and corresponding beneficial effects are specifically understood with reference to the foregoing corresponding embodiment, which is not described herein again, and the question and answer identification device 40 includes:

an obtaining unit 701, configured to obtain a target question-answer pair, where the target question-answer pair is a question-answer pair composed of a target question and an alternative answer corresponding to the target question;

an input unit 702, configured to input the target question-answer pair into a target question-answer model, where a loss function of the target question-answer model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module, and a third loss function of the answer identification module;

the output unit 703 is configured to output whether a target answer corresponding to the target question is included in an answer library, where the target answer is used to indicate a correct answer corresponding to the target question.

In one possible implementation form of the method of the invention,

the acquiring unit 701 is further configured to acquire a target question;

and the processing unit is used for preprocessing the target problem.

In one possible implementation form of the method of the invention,

the obtaining unit 701 is further configured to obtain a keyword in the target question;

the screening unit is also used for screening the alternative answers according to the keywords;

and the combination unit is used for combining the target question and the screened qualified alternative answers into the target question-answer pair.

It should be noted that, because the contents of information interaction, execution process, and the like between the modules of the question-answering identifying device 70 are based on the same concept as the method embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present invention, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.

The foregoing embodiments provide different implementations of a question-answering identifying device 70, and a model training device 80 is provided below, as shown in fig. 12, where the model training device 80 is configured to execute steps executed by the model training device in the foregoing embodiments, and the execution steps and corresponding beneficial effects are specifically understood with reference to the foregoing corresponding embodiments, which are not described herein again, and the model training device 80 includes:

an obtaining unit 801, configured to obtain a training sample data set, where the training sample data set includes at least one target question-answer pair, and the target question-answer pair is a question-answer pair composed of a target question and an alternative answer corresponding to the target question;

a training unit 802, configured to train an initial question-answer model according to the training sample data set to obtain a target question-answer model, where a loss function of the target question-answer model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module, and a third loss function of the answer recognition module.

In one possible implementation form of the method of the invention,

the training unit 802 is further configured to train an initial answer recognition module according to the first sample data set to obtain the answer recognition module.

In one possible implementation form of the method of the invention,

an obtaining unit 801, configured to obtain a first semantic implicit representation according to the first sample data set, where the first semantic implicit representation is a semantic vector obtained by the answer identifying module according to the first sample data set;

the processing unit is used for:

cooperatively processing the first semantic implicit representation, the second semantic implicit representation and a third semantic implicit representation through a Factor (FM) to obtain a third loss function, wherein the second semantic implicit representation is a semantic vector obtained by the classification module according to the first sample data set, and the third semantic implicit representation is a semantic vector obtained by the answer extraction module according to the first sample data set;

and obtaining the answer identification module according to the third loss function.

In one possible implementation of the method according to the invention,

the training sample data set further comprises a second sample data set,

a training unit 802, further configured to train an initial classification module according to the second sample data set to obtain the first loss function;

and the processing unit is further used for obtaining the classification module according to the first loss function.

In one possible implementation of the method according to the invention,

the training sample data set further comprises a third sample data set, the method further comprising:

the training unit 802 is further configured to train an initial classification module according to the third sample data set to obtain the second loss function;

and the processing unit is also used for obtaining the answer extraction module according to the second loss function.

It should be noted that, because the contents of information interaction, execution process, and the like between the modules of the model training device 80 are based on the same concept as the method embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present invention, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.

Fig. 13 is a schematic diagram of a possible logical structure of the question and answer identifying device 90 according to the foregoing embodiments, which is provided for the embodiments of the present application. The question-answer identifying apparatus 90 includes: a processor 901, a communication interface 902, a memory 903, and a bus 904. The processor 901, the communication interface 902, and the memory 903 are connected to each other by a bus 904. In the embodiment of the present application, the processor 901 is configured to control and manage the actions of the question and answer recognizing device 90. Communication interface 902 is used to support communication by question and answer recognition device 90. A memory 903 for storing program codes and data of the question-answer recognizing apparatus 90.

Fig. 14 is a schematic diagram of a possible logical structure of the model training apparatus 100 according to the above embodiments, which is provided in the present application. The model training apparatus 100 includes: a processor 1001, a communication interface 1002, a memory 1003, and a bus 1004. The processor 1001, the communication interface 1002, and the memory 1003 are connected to each other by a bus 1004. In the embodiment of the present application, the processor 1001 is used for controlling and managing the actions of the model training apparatus 100. The communication interface 1002 is used to support the model training apparatus 90 for communication. A memory 1003 for storing program codes and data of the question-answer recognizing apparatus 90.

The

processors

901, 1001 may be central processing units, general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a digital signal processor and a microprocessor, or the like. The

buses

904, 1004 may be Peripheral Component Interconnect (PCI) buses or Extended Industry Standard Architecture (EISA) buses, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 13 and 14, but this does not indicate only one bus or one type of bus.

The present application further provides a chip system, which includes a processor, and is configured to support the apparatus for verifying annotation data or the apparatus for model training to implement the functions related thereto, for example, acquiring or processing the annotation data related to the method embodiments. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the means for verifying the annotation data or the means for model training. The chip system may be constituted by a chip, or may include a chip and other discrete devices.

In another embodiment of the present application, a computer-readable storage medium is further provided, in which computer-executable instructions are stored, and when the at least one processor of the apparatus executes the computer-executable instructions, the apparatus performs the method for verifying annotation data or the method for model training described in the above embodiments in fig. 1 to 7.

In another embodiment of the present application, there is also provided a computer program product comprising computer executable instructions stored in a computer readable storage medium; the computer executable instructions may be read by at least one processor of the device from a computer readable storage medium, and the execution of the computer executable instructions by the at least one processor causes the device to perform the method for verifying annotation data or the method for model training described in the embodiments of fig. 1-7 above.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application, which essentially or partly contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the modifications and substitutions should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A question-answer recognition method is characterized by comprising the following steps:

obtaining a target question-answer pair, wherein the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question;

inputting the target question-answer pair into a target question-answer model, wherein a loss function of the target question-answer model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module;

and whether a target answer corresponding to the target question is included in an output answer library or not is output, wherein the target answer is used for indicating a correct answer corresponding to the target question.

2. The question-answer recognition method according to claim 1, characterized in that it further comprises:

acquiring a target problem;

preprocessing the target problem;

the obtaining of the target question-answer pair comprises the following steps:

and acquiring the target question-answer pair according to the preprocessed target question.

3. The question-answer recognition method according to claim 1 or 2, characterized in that the method further comprises:

acquiring a keyword in the target problem;

screening the alternative answers according to the keywords;

forming the target question and the screened qualified alternative answers into the target question-answer pair;

inputting the target question-answer pairs into a question-answer model, wherein the method comprises the following steps:

and forming the target question-answer pair by the target question and the screened qualified alternative answers and inputting the target question-answer pair into the question-answer model.

4. A method of model training, comprising:

acquiring a training sample data set, wherein the training sample data set comprises at least one target question-answer pair, and the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question;

and training the initial question-answering model according to the training sample data set to obtain a target question-answering model, wherein a loss function of the target question-answering model is obtained by a first loss function of a classification module, a second loss function of an answer extraction module and a third loss function of an answer identification module.

5. The model training method of claim 4, wherein the training sample data set comprises a first sample data set, the method further comprising:

and training an initial answer recognition module according to the first sample data set to obtain the answer recognition module.

6. The model training method of claim 5, wherein training an initial answer recognition module from the first sample dataset to obtain the answer modeling module comprises:

acquiring a first semantic hidden representation according to the first sample data set, wherein the first semantic hidden representation is a semantic vector obtained by the answer recognition module according to the first sample data set;

7. The model training method of claim 4, wherein the training sample data set further comprises a second sample data set, the method further comprising:

training an initial classification module according to the second sample data set to obtain the first loss function;

and obtaining the classification module according to the first loss function.

8. The model training method of claim 4, wherein the training sample data set further comprises a third sample data set, the method further comprising:

training an initial classification module according to the third sample data set to obtain the second loss function;

and obtaining the answer extraction module according to the second loss function.

9. A question-answer identifying apparatus characterized by comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a target question-answer pair, and the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question;

the input unit is used for inputting the target question-answer pair into a target question-answer model, and a loss function of the target question-answer model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module and a third loss function of the answer identification module;

the output unit is used for outputting whether a target answer corresponding to the target question is included in an answer library or not, and the target answer is used for indicating a correct answer corresponding to the target question.

10. The question-answer recognizing apparatus according to claim 9,

the acquisition unit is also used for acquiring a target problem;

and the processing unit is also used for preprocessing the target problem.

11. The question-answer recognizing apparatus according to claim 9 or 10,

the acquisition unit is further used for acquiring keywords in the target question;

and the combination unit is used for forming the target question-answer pair by the target question and the screened qualified alternative answers.

12. A model training apparatus, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a training sample data set, the training sample data set comprises at least one target question-answer pair, and the target question-answer pair is a question-answer pair consisting of a target question and an alternative answer corresponding to the target question;

and the training unit is used for training the initial question-answering model according to the training sample data set to obtain a target question-answering model, and the loss function of the target question-answering model is obtained by a first loss function of the classification module, a second loss function of the answer extraction module and a third loss function of the answer identification module.

13. The model training apparatus of claim 12, wherein the training sample data set comprises a first sample data set,

and the training unit is also used for training the initial answer recognition module according to the first sample data set to obtain the answer recognition module.

14. Model training device as claimed in claim 13,

the obtaining unit is configured to obtain a first semantic implicit representation according to the first sample data set, where the first semantic implicit representation is a semantic vector obtained by the answer identifying module according to the first sample data set;

the processing unit is used for:

15. The model training apparatus of claim 12, wherein the training sample data set further comprises a second sample data set,

the training unit is further used for training an initial classification module according to the second sample data set to obtain the first loss function;

16. The model training device of claim 12, wherein the training sample data set further comprises a third sample data set,

the training unit is further used for training an initial classification module according to the third sample data set to obtain the second loss function;

17. A question-answering apparatus comprising a processor and a computer-readable storage medium storing a computer program;

the processor is coupled with the computer-readable storage medium, the computer program realizing the method of any of claims 1 to 3 when executed by the processor.

18. A model training apparatus comprising a processor and a computer readable storage medium having a computer program stored thereon;

the processor is coupled with the computer-readable storage medium, the computer program realizing the method of any of claims 4 to 8 when executed by the processor.

19. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 3, or which, when being executed by a processor, carries out the method of any one of claims 4 to 8.

20. A chip system, comprising a processor, the processor being invoked for performing the method of any of claims 1 to 3, or the processor being invoked for performing the method of any of claims 4 to 8.