Disclosure of Invention
In view of this, the invention provides an intelligent question-answer matching method and device.
The embodiment of the invention provides an intelligent question-answer matching method, which is applied to a server side in communication connection with a client side, and comprises the following steps:
receiving question information sent by the client, and extracting real-time subject information of the question information;
constructing a convolutional neural network model, and training all historical subject information in a preset database to input the convolutional neural network model;
searching a plurality of historical subject information matched with the real-time subject information from the preset database, wherein a first similarity value between each piece of historical subject information in the plurality of pieces of historical subject information and the real-time subject information is smaller than a first threshold value; inputting the plurality of historical topic information into a trained convolutional neural network model to calculate a second similarity value between each of the plurality of historical topic information and the real-time topic information; selecting a preset number of historical subject information from the plurality of historical subject information according to a set sequence based on the plurality of calculated second similarity values;
constructing and training a machine learning model according to the plurality of historical subject information, inputting the real-time subject information and the preset number of pieces of historical subject information into the machine learning model, and calculating to obtain the score value of each piece of historical subject information in the preset number of pieces of historical subject information;
judging whether the maximum value of the calculated multiple scoring values reaches a set value or not, and if the maximum value reaches the set value, calculating an editing distance value between the historical subject information corresponding to the maximum value and the real-time subject information;
and judging whether the edit distance value is smaller than a second threshold value, if so, acquiring a question-answer pair of the historical subject information corresponding to the maximum value, and sending the question-answer pair to the client.
Optionally, the method further comprises:
if the maximum value does not reach the set value, searching whether a client corresponding to the user image matched with the real-time subject information exists or not; and if yes, sending the question information to the found client so that the found client answers the question information.
Optionally, the step of selecting a preset number of pieces of historical topic information from the plurality of pieces of historical topic information according to a set order based on the plurality of calculated second similarity values includes:
sequencing the plurality of second similarity values obtained by calculation from high to low;
and obtaining historical subject information corresponding to the set number of second similarity values ranked at the top.
Optionally, the step of constructing and training a machine learning model according to the plurality of historical topic information includes:
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value which are included in the historical subject information aiming at each piece of historical subject information in the plurality of pieces of historical subject information;
establishing and constructing a machine learning model based on a gradient lifting tree and a regression analysis algorithm, and inputting a plurality of acquired label characteristic values, a plurality of acquired classification characteristic values and a plurality of acquired similarity characteristic values into the machine learning model for training.
Optionally, the step of inputting the real-time theme information and the preset number of pieces of historical theme information into the machine learning model to calculate a score value of each piece of historical theme information in the preset number of pieces of historical theme information includes:
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value of the real-time subject information;
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value of each preset number of historical subject information;
and inputting the label characteristic value, the classification characteristic value and the similarity characteristic value of the real-time subject information and the label characteristic value, the classification characteristic value and the similarity characteristic value of each historical subject information in the preset number of historical subject information into a trained machine learning model to calculate to obtain the score value of each historical subject information in the preset number of historical subject information.
The embodiment of the invention also provides an intelligent question-answer matching device, which is applied to a server side in communication connection with a client side, and comprises:
the real-time subject information extraction module is used for receiving the question information sent by the client and extracting the real-time subject information of the question information;
the convolutional neural network model building module is used for building a convolutional neural network model and training all historical subject information in a preset database to the convolutional neural network model;
the theme information screening module is used for searching a plurality of pieces of historical theme information matched with the real-time theme information from the preset database, wherein a first similarity value between each piece of historical theme information in the plurality of pieces of historical theme information and the real-time theme information is smaller than a first threshold value; inputting the plurality of historical topic information into a trained convolutional neural network model to calculate a second similarity value between each of the plurality of historical topic information and the real-time topic information; selecting a preset number of historical subject information from the plurality of historical subject information according to a set sequence based on the plurality of calculated second similarity values;
the score value calculation module is used for constructing and training a machine learning model according to the plurality of historical subject information, inputting the real-time subject information and the preset number of pieces of historical subject information into the machine learning model, and calculating to obtain the score value of each piece of historical subject information in the preset number of pieces of historical subject information;
the first judgment module is used for judging whether the maximum value of the calculated multiple scoring values reaches a set value or not, and if the maximum value reaches the set value, calculating an editing distance value between the historical subject information corresponding to the maximum value and the real-time subject information;
and the second judging module is used for judging whether the editing distance value is smaller than a second threshold value, if the editing distance value is smaller than the second threshold value, obtaining a question-answer pair of the historical subject information corresponding to the maximum value, and sending the question-answer pair to the client.
Optionally, the first determining module is further configured to:
if the maximum value does not reach the set value, searching whether a client corresponding to the user image matched with the real-time subject information exists or not; and if yes, sending the question information to the found client so that the found client answers the question information.
Optionally, the topic information screening module selects a preset number of pieces of historical topic information from the plurality of pieces of historical topic information according to a set sequence based on the plurality of second similarity values obtained through calculation in the following manner:
sequencing the plurality of second similarity values obtained by calculation from high to low;
and obtaining historical subject information corresponding to the set number of second similarity values ranked at the top.
Optionally, the score value calculation module constructs and trains a machine learning model according to the plurality of historical topic information by:
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value which are included in the historical subject information aiming at each piece of historical subject information in the plurality of pieces of historical subject information;
establishing and constructing a machine learning model based on a gradient lifting tree and a regression analysis algorithm, and inputting a plurality of acquired label characteristic values, a plurality of acquired classification characteristic values and a plurality of acquired similarity characteristic values into the machine learning model for training.
Optionally, the score value calculation module inputs the real-time subject information and the preset number of pieces of historical subject information into the machine learning model to calculate a score value of each piece of historical subject information in the preset number of pieces of historical subject information by:
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value of the real-time subject information;
acquiring a label characteristic value, a classification characteristic value and a similarity characteristic value of each preset number of historical subject information;
and inputting the label characteristic value, the classification characteristic value and the similarity characteristic value of the real-time subject information and the label characteristic value, the classification characteristic value and the similarity characteristic value of each historical subject information in the preset number of historical subject information into a trained machine learning model to calculate to obtain the score value of each historical subject information in the preset number of historical subject information.
The embodiment of the invention also provides a server, which comprises a memory, a processor and a computer program which is stored on the memory and can be run on the processor, wherein the processor realizes the intelligent question-answer matching method when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which comprises a computer program, and the computer program controls the server side where the readable storage medium is located to execute the intelligent question-answer matching method when running.
Advantageous effects
The intelligent question-answer matching method and device provided by the embodiment of the invention firstly train the constructed convolutional neural network model based on all historical subject information in the preset database, then calculating a second similarity value of each historical topic information matched with the real-time topic information through the trained convolutional neural network model, selecting a preset number of historical subject information from the plurality of historical subject information matched with the real-time subject information according to a set sequence based on the second similarity value, calculating the score values of the preset number of historical subject information by adopting a machine learning model, and finally carrying out double judgment based on the score values to obtain question-answer pairs meeting judgment conditions, the multi-angle analysis and processing of the question information can be realized through the convolutional neural network model and the machine learning model, and the quality of the obtained question-answer pairs is further improved.
Furthermore, the client corresponding to the matched user image is searched according to the real-time subject information, and the question information is sent to the client, so that the flexibility of processing the question information is improved, and the quality of answers given by the client aiming at the question information can be ensured.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The inventor finds that the existing question-answering system has a single processing mode aiming at a new question put forward by a user and low flexibility, and further the quality of the matched answer is low.
The above prior art solutions have shortcomings which are the results of practical and careful study of the inventor, and therefore, the discovery process of the above problems and the solutions proposed by the following embodiments of the present invention to the above problems should be the contribution of the inventor to the present invention in the course of the present invention.
Based on the research, the embodiment of the invention provides an intelligent question-answer matching method and device, which can realize multi-angle analysis and processing on question information through a convolutional neural network model and a machine learning model, and further ensure the quality of the obtained question-answer pairs.
Fig. 1 is a block diagram illustrating a server 10 according to an embodiment of the present invention. The server 10 in the embodiment of the present invention has data storage, transmission, and processing functions, and as shown in fig. 1, the server 10 includes: memory 11, processor 12, network module 13 and intelligent question-answer matching device 20.
The memory 11, the processor 12 and the network module 13 are electrically connected directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 11 stores an intelligent question-answer matching device 20, the intelligent question-answer matching device 20 includes at least one software functional module which can be stored in the memory 11 in the form of software or firmware (firmware), and the processor 12 executes various functional applications and data processing by running software programs and modules stored in the memory 11, such as the intelligent question-answer matching device 20 in the embodiment of the present invention, so as to implement the intelligent question-answer matching method in the embodiment of the present invention.
The Memory 11 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 11 is used for storing a program, and the processor 12 executes the program after receiving an execution instruction.
The processor 12 may be an integrated circuit chip having data processing capabilities. The Processor 12 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The network module 13 is used for establishing communication connection between the server 10 and other communication terminal devices through a network, and implementing transceiving operation of network signals and data. The network signal may include a wireless signal or a wired signal.
It is understood that the configuration shown in fig. 1 is merely illustrative, and that the server 10 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
An embodiment of the present invention also provides a computer-readable storage medium, which includes a computer program. The computer program controls the server 10 where the readable storage medium is located to execute the following intelligent question-answer matching method when running.
Fig. 2 shows a flowchart of an intelligent question-answer matching method according to an embodiment of the present invention. The method steps defined by the flow related to the method are applied to the server 10 and can be implemented by the processor 12. The specific process shown in FIG. 2 will be described in detail below:
in this embodiment, the server 10 is in communication connection with a client, and the server 10 includes a preset database, where a plurality of question-answer pairs are stored in the preset database, and the question-answer pairs are provided by a user of a platform where the server 10 is located, where each question-answer pair corresponds to history topic information, for example, the history topic information in the preset database is M1~Mn。
And step S21, receiving the question information sent by the client, and extracting the real-time subject information of the question information.
The server 10 extracts keywords from the received question information, and then obtains real-time subject information M corresponding to the question informationtAnd the method is used for question and answer matching later.
And step S22, constructing a convolutional neural network model, and training the input convolutional neural network model by using all historical subject information in a preset database.
Constructing a Convolutional Neural Network (CNN) model, and taking M1~MnInputting the CNN model for training.
Step S23, finding a plurality of historical topic information matched with the real-time topic information from the preset database, inputting the plurality of historical topic information into the trained convolutional neural network to calculate a second similarity value between each of the plurality of historical topic information and the real-time topic information, and selecting a preset number of pieces of historical topic information from the plurality of historical topic information according to a set sequence based on the plurality of calculated second similarity values.
For the convenience of the following description, assume that M is equal totM is matched with a plurality of history subject information1~M20Wherein M is1~M20Each historical topic information in (1) and (M)tThe first similarity value between the first and second sets is smaller than a first threshold, the first threshold can be set according to actual conditions, and the first similarity value is a cosine similarity value.
Alternatively, the server 10 may search a preset database for a plurality of historical topic information matching the real-time topic information by using a search engine.
In this embodiment, the preset database is used as the first database, the database queried by the search engine is used as the second database, and the trained CNN model is used as the third database.
Will M1~M20Inputting a training completion limit CNN model to calculate M1~M20Each historical topic information in (1) and (M)tThe second similarity values obtained by calculation are sorted according to a sequence from high to low, and history subject information corresponding to a set number of second similarity values sorted in front is obtained, for example, the set number is five, and the obtained first five history subject information are: m3、M1、M2、M8And M13。
Step S24, a machine learning model is built and trained according to the plurality of historical subject information, the real-time subject information and the preset number of historical subject information are input into the machine learning model, and the score value of each piece of historical subject information in the preset number of historical subject information is obtained through calculation.
Referring to fig. 3, in the present embodiment, one implementation manner of step S24 is illustrated by step S241, step S242, step S243, and step S244.
Step S241, for each of the plurality of historical topic information matched with the real-time topic information, a tag feature value, a classification feature value, and a similarity feature value included in the historical topic information are obtained.
Extraction of M1~M20The tag feature value, the classification feature value, and the similarity feature value of each of the historical topic information.
Step S242, a machine learning model is established and constructed based on the gradient lifting tree and the regression analysis algorithm, and the obtained multiple label characteristic values, multiple classification characteristic values and multiple similarity characteristic values are input into the machine learning model for training.
Among them, the Gradient Boosting Decision Tree (GBDT) has good nonlinear fitting ability and robustness, and the Regression analysis algorithm (LR) can be applied to continuous and categorical independent variables, and is easy to use and interpret.
The machine learning model built based on GBDT and LR can further analyze and grade the correlation between the historical subject information and the real-time subject information, and the accuracy and the reliability of analysis are improved.
Step S243, acquiring a tag feature value, a classification feature value, and a similarity feature value of the real-time subject information, and acquiring a tag feature value, a classification feature value, and a similarity feature value of each of a preset number of pieces of historical subject information.
Separately acquire Mt、M3、M1、M2、M8And M13The label feature value, the classification feature value, and the similarity feature value.
Step S244, inputting the label feature value, the classification feature value, and the similarity feature value of the real-time subject information and the label feature value, the classification feature value, and the similarity feature value of each historical subject information in the preset number of historical subject information into the trained machine learning model, and calculating to obtain the score value of each historical subject information in the preset number of historical subject information.
Will Mt、M3、M1、M2、M8And M13Inputting the label characteristic value, the classification characteristic value and the similarity characteristic value into a machine learning model, and calculating M3、M1、M2、M8And M13The respective score values.
In step S25, it is determined whether or not the maximum value among the calculated plurality of score values reaches a set value.
For example, M3、M1、M2、M8And M13The respective scores were: grade3、grade1、grade2、grade8And grade13。
Assuming maximum value as grade13If grade13When the set value is reached (assumed to be 80 minutes), the process proceeds to step S26.
If grade13The set value is not reached, and the process proceeds to step S29.
Step S26, calculating an edit distance value between the history topic information and the real-time topic information corresponding to the maximum value, and determining whether the edit distance value is smaller than a second threshold.
Calculating M13M betweentEdit distance value between.
If the edit distance value is smaller than the second threshold (set according to actual conditions), the process proceeds to step S27.
If the edit distance value is not less than the second threshold value, the process goes to step S28.
And step S27, obtaining the question-answer pair of the history subject information corresponding to the maximum value, and sending the question-answer pair to the client.
Obtaining M13And sending the corresponding question-answer pairs to the client for the user to view.
And step S28, correcting the maximum value to obtain a correction result, and inputting the correction result into the machine learning model for secondary training.
It is to be appreciated that the machine learning model pair M after secondary training can be utilized3、M1、M2、M8And M13The score value calculation is re-performed, and then steps S25 and S26 are repeated.
And step S29, finding out the client corresponding to the user image matched with the real-time subject information, and sending the question information to the found client so that the found client can answer the question information.
If grade13Do not reachTo the set point, M is indicated3、M1、M2、M8And M13The quality of multi-corresponding question-answer pairs is difficult to match with MtMatching, the server 10 will be according to MtThe label characteristic value, the classification characteristic value and the like are found out from MtAnd the matched client corresponding to the user portrait sends the question information to the found client so that the found client can answer the question information.
Thus, it is difficult to find M at the server 10tWhen the question answering is conducted with high matching precision, characteristics of the question information such as the specialty degree and the regionality can be analyzed, then the relatively appropriate target users can be found out, the question information is confirmed to be answered by the target users, and therefore the specialty and the effectiveness of question answering are improved.
Optionally, the question information and the answers form a question-answer pair, and the question-answer pair is stored in the first database, so that the first database is updated.
It can be understood that the first database, the second database and the third database have a real-time updating function, so that the accuracy of matching answers to subsequent questions can be improved, and the quality of the matched answers is further ensured.
On the basis, as shown in fig. 4, an embodiment of the present invention provides an intelligent question-answer matching device 20, where the intelligent question-answer matching device 20 includes: the system comprises a real-time theme information extraction module 21, a convolutional neural network model construction module 22, a theme information screening module 23, a score value calculation module 24, a first judgment module 25 and a second judgment module 26.
And the real-time subject information extraction module 21 is configured to receive the question information sent by the client, and extract the real-time subject information of the question information.
Since the real-time subject information extracting module 21 is similar to the implementation principle of step S21 in fig. 2, it will not be further described here.
And the convolutional neural network model building module 22 is configured to build a convolutional neural network model, and train all historical subject information in a preset database to the convolutional neural network model.
Since the convolutional neural network model building block 22 is similar to the implementation principle of step S22 in fig. 2, it will not be further described here.
The topic information screening module 23 is configured to find out, from the preset database, a plurality of pieces of historical topic information that match the real-time topic information, where a first similarity value between each piece of historical topic information in the plurality of pieces of historical topic information and the real-time topic information is smaller than a first threshold; inputting the plurality of historical topic information into a trained convolutional neural network model to calculate a second similarity value between each of the plurality of historical topic information and the real-time topic information; and selecting a preset number of historical topic information from the plurality of historical topic information according to a set sequence based on the plurality of calculated second similarity values.
Since the principle of the implementation of the subject information filtering module 23 is similar to that of step S23 in fig. 2, no further description is provided here.
And the score value calculating module 24 is configured to construct and train a machine learning model according to the plurality of historical topic information, input the real-time topic information and the preset number of historical topic information into the machine learning model, and calculate a score value of each piece of historical topic information in the preset number of historical topic information.
Since the credit value calculation module 24 is similar to the implementation principle of step S24 in fig. 2, it will not be further described here.
The first determining module 25 is configured to determine whether a maximum value of the calculated multiple score values reaches a set value, and if the maximum value reaches the set value, calculate an edit distance value between the historical subject information corresponding to the maximum value and the real-time subject information.
Since the first judging module 25 is similar to the implementation principle of the steps S25, S26 and S29 in fig. 2, it will not be further described here.
A second determining module 26, configured to determine whether the edit distance value is smaller than a second threshold, if the edit distance value is smaller than the second threshold, obtain a question-answer pair of the history subject information corresponding to the maximum value, and send the question-answer pair to the client.
Since the second determination module 26 is similar to the implementation principle of the steps S26, S27 and S28 in fig. 2, no more description will be made here.
In summary, the intelligent question-answer matching method and device provided by the embodiment of the invention can realize multi-angle analysis and processing of question information through the convolutional neural network model and the machine learning model, and improve flexibility of processing question information by searching the client corresponding to the matched user image according to the real-time subject information and sending the question information to the client, so that the quality of the obtained question-answer pairs can be improved.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that substantially contributes to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server 10, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.