WO2021056709A1 - 相似问识别方法、装置、计算机设备及存储介质 - Google Patents

相似问识别方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021056709A1
WO2021056709A1 PCT/CN2019/116922 CN2019116922W WO2021056709A1 WO 2021056709 A1 WO2021056709 A1 WO 2021056709A1 CN 2019116922 W CN2019116922 W CN 2019116922W WO 2021056709 A1 WO2021056709 A1 WO 2021056709A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
question
input
matrix
similarity
Prior art date
Application number
PCT/CN2019/116922
Other languages
English (en)
French (fr)
Inventor
邓悦
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021056709A1 publication Critical patent/WO2021056709A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence technology, in particular to a similarity question recognition method, device, computer equipment and storage medium.
  • the standard question and the question question are usually trained into a word vector model to obtain the word vector of the standard question and the word vector of the question question, and then the two word vectors obtained are matched to obtain The question question corresponds to the standard question.
  • the existing method has at least the following question: the semantic information contained in the trained word vector is fixed. In natural language, it is often In the next round of questions, a simple question and answer will be based on the previous round of questions. For example: A asks "Did you buy a house?" B answers "I bought a house", and A asks "Where?” and how to determine "Where? "What does this question mean? Just relying on the trained word vector model to determine which standard question the question corresponds to, the accuracy often fails to meet the requirements.
  • the embodiments of the present application provide a similarity question recognition method, device, computer equipment, and storage medium to solve the current problem of low accuracy of similarity question recognition.
  • an embodiment of the present application provides a method for identifying similar questions, including:
  • the target similarity question model includes a first input layer, a second input layer, a first coding layer, a second coding layer, a first transformer layer, a second transformer layer, and a target fully connected layer;
  • the standard question in the basic data is transferred to the first coding layer through the first input layer, a vector matrix is extracted from the standard question through the first coding layer, and the extracted vector matrix is input To the first transformer layer, the first transformer layer is used to perform feature extraction on the extracted vector matrix to obtain the first feature matrix;
  • the question to be recognized in the basic data is transferred to the all through the second input layer In the second coding layer, the vector matrix of the question to be recognized is extracted through the second coding layer, and the extracted vector matrix is input to the second transformer layer, and the second transformer layer is used to pair the extracted Perform feature extraction on a
  • an embodiment of the present application provides a similar question recognition device, including:
  • the question acquisition module is used to obtain the question to be recognized;
  • the question grouping module is used to obtain each standard question from the preset question library, and the question to be recognized and each standard question are formed separately A set of basic data;
  • a question input module for inputting each set of the basic data into a target similarity question model, wherein the target similarity question model includes a first input layer, a second input layer, and a first coding layer , The second coding layer, the first transformer layer, the second transformer layer and the target fully connected layer;
  • the first feature extraction module is used to pass the standard in the basic data to the first input layer A coding layer, through the first coding layer to extract the vector matrix of the standard question, and input the extracted vector matrix to the first transformer layer, and use the first transformer layer to characterize the extracted vector matrix Extraction to obtain a first feature matrix;
  • a second feature extraction module configured to pass the question to be recognized in the basic data to the second coding layer through the second input layer, and pass the second coding layer Perform vector matrix extraction on the
  • the second feature matrix undergoes transformation processing to obtain the transformation result, and according to the transformation result, the recognition result corresponding to the basic data is determined; the result determination module is configured to determine the recognition result corresponding to each group of basic data The standard question corresponding to the question to be identified.
  • an embodiment of the present application provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and running on the processor, and the processor executes the computer
  • the steps of the above-mentioned similar question recognition method are realized when the instruction is read.
  • an embodiment of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by a processor Steps to realize the above-mentioned similar question recognition method.
  • the similar question recognition method, device, computer equipment, and storage medium obtained the question sentence to be recognized, and obtain each standard question from the preset question library, and compare the question sentence to be recognized with each standard question.
  • the questions are composed of a set of basic data, and then each set of basic data is input into the target similarity question model.
  • the target similarity question model includes the first input layer, the second input layer, the first coding layer, the second coding layer, and the second coding layer.
  • the vector matrix is extracted from the standard through the first coding layer, and the extracted vector matrix is input to the first transformer layer for feature extraction to obtain the first feature Matrix, the question to be recognized in the basic data is transferred to the second coding layer through the second input layer for vector matrix extraction, and the extracted vector matrix is input to the second transformer layer, and the second transformer layer is used to pair the extracted
  • the vector matrix performs feature extraction to obtain the second feature matrix.
  • the first feature matrix and the second feature matrix are input to the target fully connected layer.
  • the first feature matrix and the second feature matrix are transformed to obtain According to the transformation result, the recognition result corresponding to the basic data is determined, and the standard question corresponding to the question to be recognized is determined according to the recognition result corresponding to each group of basic data, which improves the accuracy and efficiency of similar question recognition.
  • FIG. 1 is a schematic diagram of an application environment of the similarity question recognition method provided by an embodiment of the present application
  • FIG. 2 is a flowchart of the realization of the similarity question recognition method provided by the embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a similarity question model in a similarity question recognition method provided by an embodiment of the present application
  • FIG. 4 is an implementation flowchart of a similarity question model training method provided by an embodiment of the present application.
  • FIG. 5 is an implementation flowchart of step S11 in the similarity question recognition method provided by the embodiment of the present application.
  • FIG. 6 is another implementation flowchart of step S11 in the similarity question recognition method provided by the embodiment of the present application.
  • FIG. 7 is an implementation flowchart of step S16 in the similarity question recognition method provided by the embodiment of the present application.
  • FIG. 8 is an implementation flowchart of step S164 in the similarity question recognition method provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of a similar question recognition device provided by an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 shows an application environment of the similar question recognition method provided by an embodiment of the present application.
  • the similar question recognition method is applied in the similar question recognition scene in the intelligent interview.
  • the recognition scenario includes the client and the server.
  • the client and the server are connected through the network.
  • the server trains the target similar question model.
  • the client needs to recognize the question to be recognized
  • the question to be recognized is Sent to the server
  • the server uses the target similarity question model to identify the standard question corresponding to the question to be identified.
  • the client can be, but not limited to, various personal computers, portable notebooks, tablets, mobile phones, and network data transmission functions.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • FIG. 2 shows a similar question identification method provided by an embodiment of the present application. The method is applied to the server in FIG. 1 as an example for description. The details are as follows:
  • the question to be recognized is first obtained, so that the question to be recognized is subsequently compared with the standard question in the preset question library to determine the standard question corresponding to the question to be recognized. That is, it is judged which standard question corresponds to the similar question to which the question to be recognized belongs.
  • each standard question is obtained from a preset question library, and the question sentence to be identified is combined with each standard separately to obtain the basic data corresponding to each standard question.
  • the question sentence to be recognized there are 100 standard questions stored in the preset question library.
  • the question sentence to be recognized and each standard question form a set of basic data to obtain 100 sets of basic data.
  • S23 Input each group of basic data into the target similarity question model, where the target similarity question model includes a first input layer, a second input layer, a first coding layer, a second coding layer, a first transformer layer, and a second transformer. Layer and target fully connected layer.
  • each group of basic data is input into the target similarity question model, so that the target similarity question model is subsequently used to determine which group of other basic data is a pair of similar questions.
  • the target similarity question model includes the first input layer, the second input layer, the first coding layer, the second coding layer, the first transformer layer, the second transformer layer and the target fully connected layer. Please refer to Figure 3 and Figure 3 for details.
  • S24 Pass the standard question in the basic data to the first coding layer through the first input layer, extract the vector matrix of the standard question through the first coding layer, and input the extracted vector matrix to the first transformer layer, using the first A transformer layer performs feature extraction on the extracted vector matrix to obtain the first feature matrix.
  • the standard question is transferred to the first coding layer through the first input layer
  • the vector matrix is extracted from the standard question through the first coding layer
  • the extracted vector matrix is input to the first transformer layer
  • the first transformer layer is used. Perform feature extraction on the extracted vector matrix to obtain the first feature matrix.
  • the transformer layer is constructed through the transformer framework.
  • the transformer framework is a classic of natural language processing proposed by the Google team.
  • the transformer can be increased to a very deep depth and use the attention mechanism to achieve rapid parallelism. Therefore, the Transformer framework is relatively
  • the usual convolutional neural network or recurrent neural network has the characteristics of fast training speed and high recognition rate.
  • S25 Pass the question to be recognized in the basic data to the second coding layer through the second input layer, extract the vector matrix of the question to be recognized through the second coding layer, and input the extracted vector matrix to the second transformer layer , Using the second transformer layer to perform feature extraction on the extracted vector matrix to obtain the second feature matrix.
  • the question to be recognized is transferred to the second coding layer through the second input layer, and the vector matrix is extracted from the question to be recognized through the second coding layer, and the extracted vector matrix is input to the second transformer layer.
  • the second transformer layer performs feature extraction on the extracted vector matrix to obtain a second feature matrix.
  • step S24 there is no inevitable sequence relationship between step S24 and step S25, and it can also be executed in parallel, which is not specifically limited here.
  • the first feature matrix output by the first transformer layer and the second feature matrix output by the second transformer layer are respectively input to the target fully connected layer.
  • the first feature matrix and the second feature matrix are transformed to obtain the transformation result, and the recognition result is determined according to the transformation result.
  • the specific process can refer to the steps from step S1641 to step S1643, in order to avoid Repeat, I won’t repeat it here.
  • the standard question in the recognition result that the basic data of the group of basic data is similar is obtained as the standard question corresponding to the question to be recognized.
  • the question to be identified and each standard question are formed into a set of basic data, and then each set of basic data
  • the data is input into the target similarity question model for recognition, and the recognition result corresponding to each group of basic data is obtained.
  • the recognition result corresponding to each group of basic data the standard question corresponding to the question to be recognized is determined, which improves the accuracy and accuracy of the recognition of similar questions. effectiveness.
  • the similarity question recognition method further includes training of the target similarity question model.
  • the following specific embodiment is used to describe the realization method of the target similarity question model training in detail, please refer to the figure. 4.
  • the specific process is as follows:
  • S11 Obtain a preset corpus, and use the preset corpus to train the initial semantic recognition model to obtain a trained semantic recognition model, where the initial semantic recognition model is a multi-layer long and short-term memory network, and the multi-layer long and short-term memory network includes In the coding layer, K long and short-term memory layers and fully connected layers, K is a positive integer greater than 1.
  • the semantic recognition model used in this embodiment is a multi-layer long and short-term memory network model.
  • the preset corpus may specifically be content in hot topics such as Weibo, film and television lines, and the acquisition of the preset corpus may specifically be obtained through a web crawler.
  • the multi-layer long and short-term memory network is a neural network model that contains multiple long and short-term memory layers
  • the long and short-term memory network (Long Short-Term Memory, referred to as LSTM) is a time loop neural network, suitable for processing and predicting time series Important events with relatively long intervals and delays.
  • the preset condition may specifically be to reach a preset number of iterations, for example, 20 iterations, or it may mean that the fitting reaches a preset range during the training process.
  • the value of K can be set according to actual requirements.
  • the value of K is 3, that is, 3 long and short-term memory layers are used.
  • S12 Obtain an interview corpus from the preset question database, where the interview corpus includes standard questions and similar questions corresponding to the standard questions.
  • a preset question library is stored on the server, and the interview corpus is obtained from the preset question library.
  • the interview corpus includes standard questions and similar questions corresponding to standard questions.
  • the similar question in this embodiment refers to a question that has the same or similar semantics as the standard question in the preset question library.
  • S13 Recognize the trained semantic recognition model by inputting the standard question to obtain the first vector matrix output by the coding layer and K first output results from the long and short-term memory layer, and compare the first vector matrix and K first outputs The results are weighted and summarized to obtain the first parameter information.
  • the trained semantic recognition model obtained is a general semantic recognition model.
  • the standard questions in the interview corpus are input into the In the trained semantic recognition model, the first vector matrix output by the coding layer and the digitized vector output by each long and short-term memory layer are obtained.
  • the first vector matrix and each digitized vector are used as the first output result, and the first output result is calculated for the first vector matrix and each digitized vector.
  • One output result is weighted and summarized to obtain the first parameter information.
  • performing a weighted summary on the first output result to obtain the first parameter information includes:
  • a 0 is the first output result corresponding to the coding layer
  • a i is the first output result corresponding to the i-th long and short-term memory layer
  • A is the first parameter information.
  • the trained semantic recognition model obtained is a general semantic recognition model.
  • the standard questions in the interview corpus are input into the In the trained semantic recognition model, the second vector matrix output by the coding layer and the digitized vector output by each long and short-term memory layer are obtained.
  • the second vector matrix and each digitized vector are used as the second output result, and the Second, the output results are weighted and summarized to obtain the second parameter information.
  • the second output result is weighted and summarized, and the specific process of obtaining the second parameter information can be referred to the description of step S13, and to avoid repetition, it will not be repeated here.
  • step S13 there is no inevitable sequence relationship between step S13 and step S14, and the specific sequence can also be executed in parallel, which is not specifically limited here.
  • S15 Input the standard question, the similarity question, the first parameter information and the second parameter information corresponding to the standard question into the similarity question model, where the similarity question model includes the first input layer, the second input layer, the first coding layer, The second coding layer, the first transformer layer, the second transformer layer and the target fully connected layer.
  • the standard question, the similarity question corresponding to the standard question, the first parameter information, and the second parameter information are input into the similarity question model, where the similarity question model includes a first input layer, a second input layer, and a first coding layer.
  • the similarity question model includes a first input layer, a second input layer, and a first coding layer.
  • the second coding layer, the first transformer layer, the second transformer layer and the target fully connected layer, the specific structure diagram of the similar question model can be referred to as shown in Figure 3.
  • the first input layer and the second input layer are used to receive standard questions and similar questions corresponding to the standard questions.
  • the first coding layer and the second coding layer are used to extract vector feature data from the standard question and the corresponding similar question of the standard question;
  • the first transformer layer and the second transformer layer are used to process vector feature data to obtain a feature matrix with contextual semantics, and input the feature matrix to the target fully connected layer.
  • the target fully connected layer is used to identify the feature matrix, and adjust the parameters of the first coding layer, the second coding layer, the first transformer layer, and the second transformer layer according to the recognition result.
  • S16 Use the first input layer to receive standard questions, use the second input layer to receive similar questions corresponding to the standard questions, use the first parameter information as the initial parameter information of the first transformer layer, and use the second parameter information as the second transformer layer's initial parameter information. Initial parameter information, and train the similarity model to obtain the target similarity model.
  • the first input layer is used to receive standard questions
  • the second input layer is used to receive similar questions corresponding to the standard questions
  • the first parameter information is used as the initial parameter information of the first transformer layer
  • the second parameter information is used as the second transformer layer.
  • the initial parameter information of, and the similarity model is trained to obtain the target similarity model.
  • the similarity question model is trained, and the specific process of obtaining the target similarity question model can refer to the description of step S161 to step S166. To avoid repetition, it will not be repeated here.
  • the first parameter information obtained by using the trained semantic recognition model to recognize the standard question is used as the initial parameter information of the first transformer layer.
  • the second parameter information obtained by recognizing similar questions by the trained semantic recognition model The parameter information, as the initial parameter information of the second transformer layer, is beneficial to the subsequent improvement of the similarity model training speed.
  • the initial semantic recognition model is trained by using the preset corpus to obtain a trained semantic recognition model, so that the semantic recognition model can be used to understand the contextual semantics of similar and standard questions, which improves the follow-up Use the semantic recognition model to recognize the accuracy rate.
  • obtain the interview corpus from the preset question database and then input the similar and standard questions in the interview corpus into the trained semantic recognition model for recognition.
  • the first parameter information and the second parameter information so that the obtained first parameter information and the second parameter information, the standard and similar questions in the preset question library have a higher degree of recognition, and then the first parameter information and the second parameter information
  • the standard question and the corresponding similarity question of the standard question are used as training data and input into the similarity question model for training to obtain the target similarity question model. It realizes the use of parameter information with a high degree of recognition of the standard and similar questions in the preset question library as the initial parameters of the similarity question model, which is beneficial to improve the training speed of the similarity question model and the recognition accuracy rate of the target similarity question model .
  • S1111 Crawl the preset domain name by means of a web crawler to obtain the uniform resource locator in the page information corresponding to the preset domain name, where the page information includes at least one uniform resource locator.
  • the preset domain name is crawled by means of a web crawler, and each uniform resource locator contained in the web page corresponding to the preset domain name is obtained.
  • the Uniform Resource Locator is a concise representation of the location and access method of resources available on the Internet, and is the address of a standard resource on the Internet. Every file on the Internet has a unique URL, which contains information that indicates the location of the file and how the browser should handle it.
  • the basic method of the depth-first strategy is to follow the order of depth from low to high, and visit the links of the next level of web pages in turn, until it can no longer go deep. After completing a crawling branch, the crawler returns to the previous link node to further search for other links. When all the links are traversed, the crawling task ends.
  • the breadth-first strategy is to crawl pages according to the depth of the web content directory level, and the pages at the shallower directory level are crawled first. When the page in the same level is crawled, the crawler will go to the next level to continue crawling.
  • This strategy can effectively control the crawling depth of the page, avoid the problem that crawling cannot be ended when encountering an infinitely deep branch, and is convenient to implement without storing a large number of intermediate nodes.
  • the crawling strategy adopted in this embodiment of the application is a breadth-first strategy.
  • the preset domain name is first crawled to obtain each application channel, and then each application channel is subsequently crawled to obtain each application channel.
  • the basic information of the application avoids the extra time overhead caused by crawling too much useless information, and improves the crawling efficiency.
  • S1112 Crawl each uniform resource locator to obtain the basic information corresponding to each uniform resource locator.
  • S1114 Perform data cleaning on the corpus information, and generate a preset corpus according to the corpus information after the data cleaning.
  • data cleaning is performed on the corpus information, and a preset corpus is generated according to the corpus information after the data cleaning.
  • Data cleaning refers to the last process of discovering and correcting identifiable errors in data files, including checking data consistency, handling invalid and missing values, etc.
  • the preset domain name is crawled through a web crawler to obtain the uniform resource locator in the page information corresponding to the preset domain name, and then each uniform resource locator is crawled to obtain each uniform resource
  • the basic information corresponding to the locator is matched by regular matching to each basic information to obtain the corpus information contained in each basic information, and perform data cleaning on the corpus information, and according to the corpus information after data cleaning
  • Generating a preset corpus to obtain a preset corpus with rich samples is conducive to the accuracy of subsequent training of the semantic recognition model through the preset corpus.
  • S1121 Use each corpus in the preset corpus as a training set, and input the training set to the coding layer.
  • the preset corpus contains multiple corpora, each corpus is used as a training set, and the training set is input to the coding layer, so that the subsequent coding layer performs vector feature extraction on the training set.
  • S1122 Perform vectorization processing on the training set through the coding layer to obtain a word vector corresponding to the training set, and obtain a position vector corresponding to each word vector through a preset method.
  • the training set is vectorized through the coding layer to obtain the word vector corresponding to the training set, and at the same time, the position vector corresponding to each word vector is obtained through a preset method.
  • the vectorization processing refers to the form of transforming the training set into a word vector, and subsequently, the feature extraction is performed by transforming the word vector into a vector matrix.
  • the position vector refers to the direct position relationship between the pronoun vector and other word vectors.
  • a word vector can be designated as an entity word, and then the distance between other word vectors and the entity word is calculated, and each The position vector of the word vector.
  • this embodiment designates the first word vector as the entity word.
  • the skip-gram model is adopted, the window size is set to 8, the iteration period is set to 15, and the dimension of the word vector is set to 400 dimensions.
  • a word vector mapping table is obtained, and then according to The word vector mapping table is used to obtain the word vector corresponding to each word in the training set.
  • the words in the dictionary are mapped to the words in the data set one by one, the redundant word vectors are discarded, and the position vector is extracted, that is, the position vector feature of each word in the training set, the position vector Features include the relative distance between each word in the sentence and the entity word, and the position of each word vector in the sentence is obtained.
  • the position of each word in the sentence relative to the word vector is obtained.
  • the relative position of the two entities constitutes the position vector feature of the word.
  • S1123 Perform vector concatenation on the word vector and the position vector corresponding to the word vector to obtain a vector matrix.
  • vector concatenation is performed on the word vector and the position vector corresponding to the word vector to obtain a vector matrix.
  • cascade refers to the establishment of a mapping relationship between multiple objects and the establishment of a cascade relationship between data to improve execution or management efficiency.
  • the vector cascade in this embodiment specifically refers to the establishment of word vectors and words Concatenate the position vectors corresponding to the vectors to obtain a vector matrix.
  • S1124 Input the vector matrix into the long- and short-term memory layer, and extract the contextual semantic information contained in the vector matrix through the long- and short-term memory layer.
  • the vector matrix is input to the multi-layer long and short-term memory layer, and the contextual semantic information contained in the vector matrix is extracted through the multi-layer long and short-term memory layer.
  • the Long Short-Term Memory is a time cyclic neural network, which is suitable for processing and predicting important events with relatively long intervals and delays in a time series.
  • the one-way LSTM can memorize the first word to the last word of a sentence according to the human reading order.
  • This LSTM structure can only capture the above information but not the following information, while the two-way LSTM is composed of Two LSTMs with different directions are composed.
  • One LSTM reads data from front to back according to the order of words in the sentence, and the other LSTM reads data from back to front in the opposite direction of the sentence word order, so that the first LSTM obtains the above information.
  • Another LSTM obtains the following information.
  • the joint statement of the two LSTMs is the context information of the entire sentence, and the context information is provided by the entire sentence, which naturally contains more abstract semantic information (meaning of the sentence).
  • the advantage of this method is It makes full use of the advantages of LSTM for processing sequence data with time series characteristics, and because we input location features, the entity direction information contained in the location features can be extracted after bidirectional LSTM encoding. Other methods have no such advantages.
  • this embodiment adopts a bidirectional LSTM to construct a long and short-term memory layer.
  • S1125 Input the contextual semantic information into the fully connected layer, recognize the contextual semantic information through the fully connected layer, obtain the recognized semantic information, and compare the recognized semantic information with the preset annotation information to obtain the comparison result.
  • the contextual semantic information is input to the fully connected layer, the contextual semantic information is recognized through the fully connected layer to obtain the recognized semantic information, and the recognized semantic information is compared with the preset annotation information to obtain the comparison result.
  • the comparison results include correct recognition and incorrect recognition.
  • the parameters of the long- and short-term memory layer are adjusted through back propagation, and the vector matrix is input back to the long- and short-term memory layer, and the vector is extracted through the long and short-term memory layer.
  • the steps of contextual semantic information contained in the matrix are continued until the comparison result meets the preset conditions, and the initial semantic recognition model obtained at this time is used as the trained semantic recognition model.
  • the preset condition may be a preset number of iterations, for example, 50 times, or it may be that the recognition accuracy reaches the preset accuracy threshold.
  • the recognition accuracy exceeds 90%, which can also be set according to actual conditions. There are no restrictions.
  • each corpus in the preset corpus is used as a training set, and the training set is input to the coding layer, and the training set is vectorized through the coding layer to obtain the word vector corresponding to the training set, and pass
  • the preset method is to obtain the position vector corresponding to each word vector, and then perform vector concatenation of the word vector and the position vector corresponding to the word vector to obtain a vector matrix, and then input the vector matrix into the long and short-term memory layer, and pass the long and short-term memory layer , Extract the contextual semantic information contained in the vector matrix, input the contextual semantic information into the fully connected layer, recognize the contextual semantic information through the fully connected layer, obtain the recognized semantic information, and compare the recognized semantic information with the preset annotation information, Obtain the comparison result.
  • the parameters of the long and short-term memory layer are adjusted through back propagation, and the step of returning to S1124 is continued until the comparison result meets the preset condition.
  • the obtained initial semantic recognition model is used as a trained semantic recognition model, and the recognition accuracy of the trained semantic recognition model is higher by pre-setting the massive corpus and multiple long and short-term memory layers in the corpus.
  • S161 Pass the standard question to the first coding layer through the first input layer, extract the vector matrix of the standard question through the first coding layer, and input the extracted vector matrix to the first transformer layer, using the first transformer layer pair The extracted vector matrix is subjected to feature extraction to obtain the first feature matrix.
  • the standard question is transferred to the first coding layer through the first input layer
  • the vector matrix is extracted from the standard question through the first coding layer
  • the extracted vector matrix is input to the first transformer layer
  • the first transformer layer is used. Perform feature extraction on the extracted vector matrix to obtain the first feature matrix.
  • S162 Pass the similarity question to the second coding layer through the second input layer, extract the vector matrix of the similarity question through the second coding layer, and input the extracted vector matrix to the second transformer layer, and use the second transformer layer to pair The extracted vector matrix is subjected to feature extraction to obtain the second feature matrix.
  • the similarity question is transferred to the second coding layer through the second input layer, the vector matrix is extracted from the similarity question through the second coding layer, and the extracted vector matrix is input to the second transformer layer, and the second transformer layer is used Perform feature extraction on the extracted vector matrix to obtain a second feature matrix.
  • step S161 and step S162 there is no inevitable sequence relationship between step S161 and step S162, and they can also be executed in parallel, which is not specifically limited here.
  • the first feature matrix output by the first transformer layer and the second feature matrix output by the second transformer layer are respectively input to the target fully connected layer.
  • the first feature matrix and the second feature matrix are transformed to obtain the transformation result, and the recognition result is determined according to the transformation result.
  • the specific process can refer to the steps from step S1641 to step S1643, in order to avoid Repeat, I won’t repeat it here.
  • S165 Calculate the accuracy of the recognition result according to the input standard question, the input similar question and the label information of the similar question.
  • each similarity question is preset with label information, and the accuracy of each recognition result is judged through the input standard question, the input similarity question and the label information of the similarity question, and then all the calculated results The accuracy of the recognition result.
  • the labeling information refers to the standard question used to label the similar question. Through the labeling information, it can be judged whether the recognition result in step S164 is accurate.
  • the accuracy of the recognition result is less than the preset threshold, iteratively train the similarity model through backpropagation and loss function until the accuracy of the recognition result is greater than or equal to the preset threshold, stop training, and get The similarity question model is used as the target similarity question model.
  • backpropagation is a learning algorithm of a multi-layer neuron network. It is based on the gradient descent method.
  • the backpropagation algorithm mainly consists of two links (incentive propagation, weight update). ) Iterate repeatedly until the response of the network to the input reaches the predetermined target range.
  • the loss function includes but is not limited to: Mean-Square Error (MSE) loss function, Hinge loss function, Cross Entropy loss function, Smooth L1 loss function, etc.
  • MSE Mean-Square Error
  • Hinge loss function Hinge loss function
  • Cross Entropy loss function Smooth L1 loss function
  • Smooth L1 loss function etc.
  • this The loss function used in the embodiment is a cross-entropy loss function.
  • the first feature matrix corresponding to the similarity question and the second feature matrix corresponding to the standard question are extracted, and then the first feature matrix and the second feature matrix are transformed to obtain the transformation result, which is determined according to the transformation result. Recognize the results, and then calculate the accuracy of the recognition results according to the labeling information of the standard, similar and similar questions.
  • the accuracy of the recognition result is less than the preset threshold, iterate the similarity question model by means of reverse creation. Training, until the accuracy of the recognition result is greater than or equal to the preset threshold, the similarity question model obtained at this time is used as the target similarity question model. To achieve rapid training of similar question models.
  • the following uses a specific embodiment to transform the first feature matrix and the second feature matrix at the target fully connected layer mentioned in step S164 to obtain the transformation
  • the specific realization method of the recognition result is determined and explained in detail.
  • FIG. 8 shows a specific implementation process of step S164 provided by an embodiment of the present application, which is detailed as follows:
  • the first feature matrix and the second feature matrix are spliced to obtain the target feature matrix.
  • the first feature matrix output by the first transformer layer is a(1*m) and the second feature matrix output by the second transformer layer is b(1*m), and the first feature matrix a(1*m) After splicing with the second feature matrix b(1*m), the target feature matrix c(1*2m) is obtained.
  • S1642 Perform matrix multiplication processing on the target feature matrix and the preset parameter matrix of the target fully connected layer to obtain a two-dimensional comparison vector.
  • a parameter matrix is preset, and the target feature matrix and the preset parameter matrix of the target fully connected layer are subjected to matrix multiplication processing to obtain a two-dimensional comparison vector.
  • the parameter matrix of the target fully connected layer is d(2m*2), and the target feature matrix c(1*2m) and the parameter matrix d(2m*2 ) Perform matrix multiplication to get the result as a two-dimensional comparison vector
  • S1643 Perform a numerical comparison between the first vector and the second vector in the two-dimensional comparison vector to obtain a comparison result, and determine the recognition result according to the comparison result.
  • the first vector and the second vector in the two-dimensional comparison vector are compared numerically to obtain the comparison result.
  • the comparison result is that the value of the first vector is greater than the second vector, it is determined that the recognition result is the second
  • the input similarity question of the input layer does not belong to the similarity question corresponding to the standard input of the first layer input layer.
  • the recognition result is determined to be the second input layer input
  • the similarity question belongs to the similarity question corresponding to the standard input of the input layer of the first layer.
  • the first feature matrix and the second feature matrix are spliced to obtain the target feature matrix, and then the target feature matrix and the preset parameter matrix of the target fully connected layer are subjected to matrix multiplication processing to obtain a two-dimensional comparison vector , And compare the first vector and the second vector in the two-dimensional comparison vector numerically to obtain the comparison result, and determine the recognition result according to the comparison result, and realize the rapid correlation between the first feature matrix and the second feature matrix Identify it.
  • Fig. 9 shows the principle block diagram of the similar question model training device corresponding to the similar question recognition method in the above-mentioned embodiment one-to-one.
  • the similar question model training device includes a question acquisition module 21, a question grouping module 22, a question input module 23, a first feature extraction module 24, a second feature extraction module 25, a feature input module 26, The feature recognition module 27 and the result determination module 28.
  • the detailed description of each functional module is as follows:
  • the question acquisition module 21 is used to acquire the question to be recognized
  • the question grouping module 22 is used to obtain each standard question from a preset question library, and form a set of basic data with each standard question to be identified;
  • the question input module 23 is used to input each group of basic data into the target similarity question model, where the target similarity question model includes a first input layer, a second input layer, a first coding layer, a second coding layer, and a first The transformer layer, the second transformer layer and the target fully connected layer;
  • the first feature extraction module 24 is used to pass the standard question in the basic data to the first coding layer through the first input layer, extract the vector matrix of the standard question through the first coding layer, and input the extracted vector matrix to the
  • the first transformer layer uses the first transformer layer to perform feature extraction on the extracted vector matrix to obtain the first feature matrix
  • the second feature extraction module 25 is used to transfer the question to be recognized in the basic data to the second coding layer through the second input layer, and to extract the vector matrix of the question to be recognized through the second coding layer, and to extract the extracted vector
  • the matrix is input to the second transformer layer, and the second transformer layer is used to perform feature extraction on the extracted vector matrix to obtain the second feature matrix;
  • the feature input module 26 is used to input the first feature matrix and the second feature matrix to the target fully connected layer;
  • the feature recognition module 27 is configured to perform transformation processing on the first feature matrix and the second feature matrix at the target fully connected layer to obtain the transformation result, and according to the transformation result, determine the recognition result corresponding to the basic data;
  • the result determining module 28 is used to determine the standard question corresponding to the question to be recognized according to the recognition result corresponding to each group of basic data.
  • the similar question recognition device also includes:
  • the semantic recognition model training module is used to obtain a preset corpus and use the preset corpus to train the initial semantic recognition model to obtain a trained semantic recognition model.
  • the initial semantic recognition model is a multi-layer long and short-term memory network.
  • the layer length and short-term memory network includes a coding layer, K long and short-term memory layers and a fully connected layer, K is a positive integer greater than 1;
  • the interview corpus acquisition module is used to obtain the interview corpus from the preset question database, where the interview corpus includes standard questions and similar questions corresponding to the standard questions;
  • the first parameter information acquisition module is used for recognizing the trained semantic recognition model by inputting the standard questions, and obtaining the first vector matrix output by the coding layer and the K first output results output by the long and short-term memory layer, and comparing the first vector The matrix and the K first output results are weighted and summarized to obtain the first parameter information;
  • the second parameter information acquisition module is used to input similar questions corresponding to the standard questions into the trained semantic recognition model for recognition, and obtain the second vector matrix output by the coding layer and the K second output results output by the long and short-term memory layer, and Perform a weighted summary on the second vector matrix and K second output results to obtain second parameter information;
  • the information input module is used to input the standard question, the similarity question corresponding to the standard question, the first parameter information and the second parameter information into the similarity question model, where the similarity question model includes a first input layer, a second input layer, and a second input layer.
  • Similar question model training module used to use the first input layer to receive standard questions, use the second input layer to receive standard questions corresponding to similar questions, use the first parameter information as the initial parameter information of the first transformer layer, and set the second parameter information As the initial parameter information of the second transformer layer, the similarity model is trained to obtain the target similarity model.
  • semantic recognition model training module includes:
  • the first crawling unit is used to crawl the preset domain name by means of a web crawler to obtain the uniform resource locator in the page information corresponding to the preset domain name, where the page information includes at least one uniform resource locator;
  • the second crawling unit is used to crawl each uniform resource locator to obtain the basic information corresponding to each uniform resource locator;
  • the regular matching unit is used to use regular matching to perform regular matching on each basic information to obtain the corpus information contained in each basic information;
  • the corpus generation unit is used to clean the corpus information, and generate a preset corpus according to the corpus information after data cleaning.
  • semantic recognition model training module also includes:
  • the training set input unit is used to use each corpus in the preset corpus as a training set, and input the training set to the coding layer;
  • the vectorization processing unit is used to vectorize the training set through the coding layer to obtain the word vector corresponding to the training set, and obtain the position vector corresponding to each word vector through a preset method;
  • the vector concatenation unit is used to vector concatenate the word vector and the position vector corresponding to the word vector to obtain a vector matrix
  • the semantic understanding unit is used to input the vector matrix into the long and short-term memory layer, and extract the contextual semantic information contained in the vector matrix through the long and short-term memory layer;
  • the comparison unit is used to input the contextual semantic information into the fully connected layer, recognize the contextual semantic information through the fully connected layer, obtain the recognized semantic information, and compare the recognized semantic information with the preset annotation information to obtain the comparison result;
  • the loop iteration unit is used to adjust the parameters of the long and short-term memory layer through backpropagation if the comparison result does not meet the preset conditions, and return the vector matrix to input the long- and short-term memory layer through the long and short-term memory layer.
  • the step of extracting the contextual semantic information contained in the vector matrix is continued until the comparison result meets the preset condition, and the initial semantic recognition model obtained at this time is used as the trained semantic recognition model.
  • the similarity question model training module includes:
  • the first feature matrix obtaining unit is used to pass the standard question to the first coding layer through the first input layer, extract the vector matrix of the standard question through the first coding layer, and input the extracted vector matrix to the first transformer layer , Using the first transformer layer to perform feature extraction on the extracted vector matrix to obtain the first feature matrix;
  • the second feature matrix obtaining unit is used to transfer the similarity question to the second coding layer through the second input layer, extract the vector matrix of the similarity question through the second coding layer, and input the extracted vector matrix to the second transformer layer , Using the second transformer layer to perform feature extraction on the extracted vector matrix to obtain the second feature matrix;
  • the matrix input unit is used to input the first feature matrix and the second feature matrix to the target fully connected layer
  • the transformation processing unit is used to perform transformation processing on the first feature matrix and the second feature matrix at the target fully connected layer to obtain the transformation result, and determine the recognition result according to the transformation result;
  • the iterative training unit is used to calculate the accuracy of the recognition result according to the input standard question, the input similarity question and the labeling information of the similarity question; if the accuracy of the recognition result is less than the preset threshold, the method of back propagation is used to calculate the accuracy of the recognition result.
  • the similarity question model is iteratively trained until the accuracy of the recognition result is greater than or equal to the preset threshold, and the obtained similarity question model is used as the target similarity question model.
  • transformation processing unit includes:
  • the matrix splicing subunit is used to splice the first feature matrix and the second feature matrix to obtain the target feature matrix;
  • the matrix multiplication subunit is used to perform matrix multiplication processing on the target feature matrix and the preset parameter matrix of the target fully connected layer to obtain a two-dimensional comparison vector;
  • the result determination subunit is used to perform a numerical comparison between the first vector and the second vector in the two-dimensional comparison vector to obtain the comparison result, and determine the recognition result according to the comparison result.
  • each module in the above-mentioned similarity question model training device and similarity question recognition device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • Fig. 10 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store the initial semantic recognition model and the preset question database.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to realize the above-mentioned similarity recognition method.
  • a computer device which includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions to implement similarities in the above-mentioned embodiments.
  • the steps of the question recognition method for example, steps S21 to S28 shown in FIG. 2, or the processor executes the computer-readable instructions to implement the functions of the modules/units of the similar question recognition apparatus in the above embodiment, for example, as shown in FIG. 9
  • a computer non-volatile readable storage medium is provided, and computer readable instructions are stored on the computer non-volatile readable storage medium.
  • the computer readable instructions implement the above-mentioned embodiments when executed by a processor.
  • the steps of the method for identifying similar questions, or when the computer-readable instructions are executed by a processor realize the functions of the modules/units in the apparatus for identifying similar questions in the foregoing embodiments. To avoid repetition, I won’t repeat them here.
  • the computer non-volatile readable storage medium may include: any entity or device capable of carrying the computer readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier signal and telecommunication signal, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种相似问识别方法、装置、计算机设备及存储介质,所述方法包括:通过获取待识别问句,并从预设的问题库中,获取每个标准问,将待识别问句与每个标准问分别组成一组基础数据,再将每组基础数据输入到目标相似问模型中,通过训练好的目标相似问模型进行识别,得到每组基础数据对应的识别结果,根据每组基础数据对应的识别结果,确定待识别问句对应的标准问,提高了相似问识别的准确率和效率。

Description

相似问识别方法、装置、计算机设备及存储介质
本申请以2019年9月24日提交的申请号为201910905566.1,名称为“相似问识别方法、装置、计算机设备及存储介质”的中国申请专利申请为基础,并要求其优先权。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种相似问识别方法、装置、计算机设备及存储介质。
背景技术
在智能面试场景中,预先会在问题库中设置一些标准问题,这类问题被称为标准问,对于每个标准问,内置了一些对应回答的评分规则,在面试者参与面试后,根据面试者对每个标准问的回答和评分规则,即可确定面试者的面试评分,避免人工主观因素影响。
但是,面试官在进行面试的过程中,由于语言习惯等一些因素影响,使得提问的问题与标准问虽然语义上相同,但是字面上有所区别,这种现象,在自然语言处理领域被称为相似问问题,如何准确高效地判断一个句子是否为某个标准问的相似问,是智能面试场景中一个亟待解决的难题。
在当前,进行相似问判断时,通常是采用将标准问和提问问题训练成词向量模型,得到标准问的词向量和提问问题的词向量,再将得到的两个词向量进行匹配,来获取提问问题对应的标准问,但是,在实现本申请的过程中,发明人意识到现有方式至少存入如下问题:训练好的词向量蕴含的语义信息是固定的,在自然语言中,往往在后一轮的提问中,会根据前一轮的提问进行简单问答,例如:A问“你买房了吗”,B回答“我买房了”,A再问“在哪儿”,如何确定“在哪儿”这一提问问题代表什么意思,单单依靠训练好的词向量模型判断该提问问题对应哪个标准问,准确率往往达不到要求。
发明内容
本申请实施例提供一种相似问识别方法、装置、计算机设备和存储介质,以解决当前相似问识别准确率低的问题。
第一方面,本申请实施例提供一种相似问识别方法,包括:
获取待识别问句;从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
第二方面,本申请实施例提供一种相似问识别装置,包括:
问句获取模块,用于获取待识别问句;问句分组模块,用于从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;问句输入模块,用于将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;第一特征提取模块,用于通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;第二特征提取模块,用于通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;特征输入模块,用于将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;特征识别模块,用于在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;结果确定模块,用于根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
第三方面,本申请实施例提供一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现上述相似问识别方法的步骤。
第四方面,本申请实施例提供一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述相似问识别方法的步骤。
本申请实施例提供的相似问识别方法、装置、计算机设备及存储介质,通过获取待识别问句,并从预设的问题库中,获取每个标准问,将待识别问句与每个标准问分别组成一组基础数据,再将每组基础数据输入到目标相似问模型中,其中,目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层,通过第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层进行特征提取,得到第一特征矩阵,通过第二输入层将基础数据中的待识别问句传递到第二编码层进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵,将第一特征矩阵和第二特征矩阵输入到目标全连接层,在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定基础数据对应的识别结果,根据每组基础数据对应的识别结果,确定待识别问句对应的标准问,提高了相似问识别的准确率和效率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的相似问识别方法的应用环境示意图;
图2是本申请实施例提供的相似问识别方法的实现流程图;
图3是本申请实施例提供的相似问识别方法中相似问模型的结构示意图;
图4是本申请实施例提供的相似问模型训练方法的实现流程图;
图5是本申请实施例提供的相似问识别方法中步骤S11的一实现流程图;
图6是本申请实施例提供的相似问识别方法中步骤S11的另一实现流程图;
图7是本申请实施例提供的相似问识别方法中步骤S16的一实现流程图;
图8是本申请实施例提供的相似问识别方法中步骤S164的一实现流程图;
图9是本申请实施例提供的相似问识别装置的示意图;
图10是本申请实施例提供的计算机设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1示出本申请实施例提供的相似问识别方法的应用环境。该相似问识别方法应用在智能面试中的相似问识别场景中。该识别场景包括客户端和服务端,其中,客户端和服务端之间通过网络进行连接,服务端训练目标相似问模型,在客户端需要进行待识别问句识别时,将该待识别问句发送给服务端,服务端通过目标相似问模型识别待识别问句对应的标准问,客户端具体可以但不限于是各种个人计算机、便携式笔记本、平板电脑、手机和带有网络数据传递功能的智能设备,服务端具体可以用独立的服务器或者多个服务器组成的服务器集群实现。
请参阅图2,图2示出本申请实施例提供的一种相似问识别方法,以该方法应用在图1中的服务端为例进行说明,详述如下:
S21:获取待识别问句。
具体地,在进行智能面试时,先获取待识别问句,以使后续将该待识别问句与预设的问题库中的标准问进行比对,确定该待识别问句对应的标准问,也即,判断该待识别问句属于哪一标准问对应的相似问。
S22:从预设的问题库中,获取每个标准问,将待识别问句与每个标准问分别组成一组基础数据。
具体地,从预设的问题库中,获取每个标准问,并将待识别问句与每个标准分别进行组合,得到每个标准问对应的基础数据。
示例性地,预设的问题库中存储有100个标准问,在获取到待识别问句之后,将待识别问句与每个标准问组成一组基础数据,得到100组基础数据,在后续,将每组基础数据中的标准问和待识别问句分 别输入到目标相似问模型中的第一输入层和第二输出层,以使通过目标相似问模型对每组基础数据是否属于相似问进行识别判断,得到识别结果。
S23:将每组基础数据输入到目标相似问模型中,其中,目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层。
具体地,将每组基础数据输入到目标相似问模型中,以使后续通过目标相似问模型判别哪一组别的基础数据为一对相似问。
其中,目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层,具体请参加图3,图3为目标相似问模型的结构示意图。
S24:通过第一输入层将基础数据中的标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵。
具体地,通过第一输入层将标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵。
其中,transformer层是通过transformer框架进行构建,transformer框架是谷歌团队提出的自然语言处理的经典之作,Transformer可以增加到非常深的深度,并利用注意力机制实现快速并行,因而,Transformer框架相对于通常的卷积神经网络或者循环神经网络具有训练速度快,且识别率高的特点。
S25:通过第二输入层将基础数据中的待识别问句传递到第二编码层,通过第二编码层对待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵。
具体地,通过第二输入层将待识别问句传递到第二编码层,通过第二编码层对待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵。
需要说明的是,步骤S24与步骤S25之间没有必然的先后顺序关系,其具体也可以是并列执行,此处不作具体限制。
S26:将第一特征矩阵和第二特征矩阵输入到目标全连接层。
具体地,分别将第一transformer层输出的第一特征矩阵和第二transformer层输出的第二特征矩阵均输入到目标全连接层。
S27:在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定基础数据对应的识别结果。
具体地,在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果,具体过程可参考步骤S1641至步骤S1643的步骤,为避免重复,此处不再赘述。
S28:根据每组基础数据对应的识别结果,确定待识别问句对应的标准问。
具体地,根据每组基础数据对应的识别结果,获取识别结果为该组基础数据为相似问的识别结果中的标准问,作为待识别问句对应的标准问。
在本实施例中,通过获取待识别问句,并从预设的问题库中,获取每个标准问,将待识别问句与每个标准问分别组成一组基础数据,再将每组基础数据输入到目标相似问模型中进行识别,得到每组基础数据对应的识别结果,根据每组基础数据对应的识别结果,确定待识别问句对应的标准问,提高了相似问识别的准确率和效率。
在一实施例中,步骤S23之前,该相似问识别方法还包括对目标相似问模型的训练,下面通过一个具体的实施例来对目标相似问模型的训练的实现方法进行详细说明,请参阅图4,具体过程如下:
S11:获取预设语料库,并使用预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,初始语义识别模型为多层长短时记忆网络,多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数。
具体地,本实施例中使用的的语义识别模型为多层长短时记忆网络模型,通过获取预设语料库,并将预设语料库输入到初始语义识别模型中进行训练,在达到预设条件时,得到训练好的语义识别模型。
其中,预设语料库具体可以是微博、影视台词等热门话题中的内容,获取预设语料库具体可以是通过网络爬虫的方式进行获取。
其中,多层长短时记忆网络为包含多个长短时记忆层的神经网络模型,长短时记忆网络(Long Short-Term Memory,简称LSTM)是一种时间循环神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
其中,预设条件具体可以是达到预设的迭代次数,例如迭代20次,也可以是指在训练过程中,拟合达到预设的范围等。
其中,K的数值可以根据实际需求进行设定,优选地,在本实施例中,K的数值为3,也即,使用3个长短时记忆层。
S12:从预设的问题库中,获取面试语料库,其中,面试语料库包括标准问和标准问对应的相似问。
具体地,在服务端存储由预设的问题库,从该预设的问题库中,获取面试语料库。
其中,面试语料库包括标准问和标准问对应的相似问。
其中,本实施例中的相似问是指与预设的问题库中的标准问具有相同或者相近语义的问题。
S13:将标准问输入训练好的语义识别模型进行识别,得到编码层输出的第一向量矩阵和长短时记忆层输出的K个第一输出结果,并对第一向量矩阵和K个第一输出结果进行加权汇总,得到第一参数信息。
具体地,步骤S11中,得到的训练好的语义识别模型为一个通用的语义识别模型,为提高对面试语料库中语义的理解程度,在本实施例中,将面试语料库中的标准问输入到该训练好的语义识别模型中,得到编码层输出的第一向量矩阵,以及每个长短时记忆层输出的数字化向量,将第一向量矩阵和每个数字化向量均作第一输出结果,并对第一输出结果进行加权汇总,得到第一参数信息。
进一步地,对第一输出结果进行加权汇总,得到第一参数信息包括:
获取编码层的预设权重W 0,以及第i个长短时记忆层的预设权重W i,其中,i为正整数;
使用如下公式对第一输出结果进行加权汇总,得到第一参数信息:
Figure PCTCN2019116922-appb-000001
其中,A 0为编码层对应的第一输出结果,A i为第i个长短时记忆层对应的第一输出结果,A为第一参数信息。
S14:将标准问对应的相似问输入训练好的语义识别模型进行识别,得到编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对第二向量矩阵和K个第二输出结果进行加权汇总,得到第二参数信息。
具体地,步骤S11中,得到的训练好的语义识别模型为一个通用的语义识别模型,为提高对面试语料库中语义的理解程度,在本实施例中,将面试语料库中的标准问输入到该训练好的语义识别模型中,得到编码层输出的第二向量矩阵,以及每个长短时记忆层输出的数字化向量,将第二向量矩阵和每个数字化向量均作为第二输出结果,并对第二输出结果进行加权汇总,得到第二参数信息。
本实施例中,对第二输出结果进行加权汇总,得到第二参数信息的具体过程可参照步骤S13的描述,为避免重复,此处不再赘述。
需要说明的是,步骤S13与步骤S14之间,没有必然的先后顺序关系,其具体也可以是并行执行,此处不作具体限定。
S15:将标准问、标准问对应的相似问、第一参数信息和第二参数信息输入到相似问模型中,其中,相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层。
具体地,将标准问、标准问对应的相似问、第一参数信息和第二参数信息输入到相似问模型中,其中,相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层,相似问模型的具体结构示意图可参考图3所示。
其中,第一输入层和第二输入层用于接收标准问和标准问对应的相似问。
其中,第一编码层和第二编码层用于从标准问和标准问对应的相似问中提取向量特征数据;
其中,第一transformer层和第二transformer层用于对向量特征数据进行处理,得到具有上下文语义的特征矩阵,并将该特征矩阵输入到目标全连接层。
其中,目标全连接层用于对特征矩阵进行识别,并根据识别结果对第一编码层、第二编码层、第一transformer层和第二transformer层的参数进行调整。
S16:使用第一输入层接收标准问,使用第二输入层接收标准问对应的相似问,将第一参数信息作为第一transformer层的初始参数信息,将第二参数信息作为第二transformer层的初始参数信息,并对相似问模型进行训练,得到目标相似问模型。
具体地,使用第一输入层接收标准问,使用第二输入层接收标准问对应的相似问,将第一参数信息作为第一transformer层的初始参数信息,将第二参数信息作为第二transformer层的初始参数信息,并 对相似问模型进行训练,得到目标相似问模型。
其中,对相似问模型进行训练,得到目标相似问模型的具体过程可参考步骤S161至步骤S166的描述,为避免重复,此处不再赘述。
容易理解地,采用训练好的语义识别模型对标准问进行识别得到的第一参数信息,作为第一transformer层的初始参数信息,同时,训练好的语义识别模型对相似问进行识别得到的第二参数信息,作为第二transformer层的初始参数信息,有利于后续提升相似问模型训练的速度。
在本实施例中,通过使用预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,使得可以通过该语义识别模型对相似问和标准问进行上下文语义的理解,提高了后续使用该语义识别模型进行识别的准确率,同时,从预设的问题库中,获取面试语料库,再将面试语料库中的相似问和标准问分别输入到训练好的语义识别模型中进行识别,得到第一参数信息和第二参数信息,使得得到的第一参数信息和第二参数信息本预设问题库中的标准问和相似问识别度较高,进而将第一参数信息和第二参数信息分别作为相似问模型中第一transformer层、第二transformer层的初始参数信息,并将标准问和标准问对应的相似问作为训练数据,输入到相似问模型中进行训练,得到目标相似问模型,实现了采用对预设问题库中的标准问和相似问识别度较高的参数信息,作为相似问模型的初始参数,有利于提高相似问模型的训练速度,以及目标相似问模型的识别准确率。
在图4对应的实施例的基础之上,下面通过一个具体的实施例来对步骤S11中所提及的获取预设语料库的实现方法进行详细说明,请参阅图5,具体过程如下:
S1111:通过网络爬虫的方式,对预设域名进行爬取,得到预设域名对应的页面信息中的统一资源定位符,其中,页面信息中包含至少一个统一资源定位符。
具体地,通过网络爬虫的方式,对预设域名进行爬取,得到预设的域名对应的网页页面中包含的每个统一资源定位符。
由于网络爬虫的爬行范围和数量巨大,对于爬行速度和存储空间要求较高,对于爬行页面的顺序要求相对较低,同时由于待刷新的页面太多,通常采用并行工作方式,网络爬虫的结构大致可以分为页面爬行模块、页面分析模块、链接过滤模块、页面数据库、URL队列、初始URL集合几个部分。为提高工作效率,通用网络爬虫会采取一定的爬行策略。常用的爬行策略有:深度优先策略、广度优先策略。
其中,统一资源定位符(Uniform Resource Locator,URL)是对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示,是互联网上标准资源的地址。互联网上的每个文件都有一个唯一的URL,它包含的信息指出文件的位置以及浏览器应该怎么处理它。
其中,深度优先策略的基本方法是按照深度由低到高的顺序,依次访问下一级网页链接,直到不能再深入为止。爬虫在完成一个爬行分支后返回到上一链接节点进一步搜索其它链接。当所有链接遍历完后,爬行任务结束。
其中,广度优先策略是按照网页内容目录层次深浅来爬行页面,处于较浅目录层次的页面首先被爬行。当同一层次中的页面爬行完毕后,爬虫再深入下一层继续爬行。这种策略能够有效控制页面的爬行深度,避免遇到一个无穷深层分支时无法结束爬行的问题,实现方便,无需存储大量中间节点。
优选地,本申请实施例采用的爬行策略为广度优先策略,先爬取预设的域名,获取各个应用渠道,再在后续对每个应用渠道进行爬取,获取每个应用渠道中包含的各个应用程序的基本信息,避免了爬取过多的无用信息而导致的额外时间开销,提高了爬取效率。
示例性地,通过对预设域名http://apprank.sfw.cn进行爬取,得到该预设域名中包含的其中5个统一资源定位符,这5个统一资源定位符对应的页面信息分别为:安卓市场、91助手、腾讯手机管家、UC应用商店和360手机助手,后续通过访问这5个统一资源定位符进行访问,即可获取每个统一资源定位符对应的页面信息。
S1112:爬取每个统一资源定位符,得到每个统一资源定位符对应的基本信息。
具体地,对获取到的每个统一资源定位符进行爬取,得到每个统一资源定位符对应的基本信息。
S1113:采用正则匹配的方式,对每个基本信息进行正则匹配,得到每个基本信息中包含的语料信息。
具体地,通过正则匹配的方式,对每个基本信息进行正则匹配,将匹配结果中,符合要求的每个基本信息作为语料信息。
S1114:对语料信息进行数据清洗,并根据数据清洗后的语料信息生成预设语料库。
具体地,对语料信息进行数据清洗,并根据数据清洗后的语料信息生成预设语料库。
其中,数据清洗(Data cleaning)是指发现并纠正数据文件中可识别的错误的最后一道程序,包括检查数据一致性,处理无效值和缺失值等。
在本实施例中,通过网络爬虫的方式,对预设域名进行爬取,得到预设域名对应的页面信息中的统 一资源定位符,进而爬取每个统一资源定位符,得到每个统一资源定位符对应的基本信息,再采用正则匹配的方式,对每个基本信息进行正则匹配,得到每个基本信息中包含的语料信息,并对语料信息进行数据清洗,并根据数据清洗后的语料信息生成预设语料库,得到样本丰富的预设语料库,有利于后续通过该预设语料库进行语义识别模型的训练的准确率。
在图4对应的实施例的基础之上,下面通过一个具体的实施例来对步骤S11中所提及的使用预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型的实现方法进行详细说明,请参阅图6,具体过程如下:
S1121:将预设语料库中的每个语料作为一个训练集,并将训练集输入到编码层。
具体地,在预设语料库中,包含多个语料,将每个语料作为一个训练集,并将训练集输入到编码层,以使后续编码层对训练集进行向量特征提取。
S1122:通过编码层对训练集进行向量化处理,得到训练集对应的词向量,并通过预设方式,获取每个词向量对应的位置向量。
具体地,通过编码层对训练集进行向量化处理,得到训练集对应的词向量,同时,通过预设方式,得到每个词向量对应的位置向量。
其中,向量化处理是指对将训练集转化为词向量的形式,后续通过将词向量转化为向量矩阵进行特征提取。
其中,位置向量是指用于指代词向量与其他词向量直接的位置关系,在本实施例中,可以指定一个词向量作为实体词,进而计算其他词向量与该实体词的距离,得到每个词向量的位置向量。
优选的,本实施例指定第一个词向量为实体词。
例如,在一具体实施方式中,采用skip-gram模型,窗口大小设为8,迭代周期设为15,设定词向量的维度是400维,训练结束后,得到一个词向量映射表,进而根据词向量映射表,获取训练集的每一个词对应的词向量。为了加快训练速度,将该词典中的词与数据集中出现的词一一对应,对多余的词向量舍弃,进而抽取位置向量,也即,获取训练集中的每一个词的位置向量特征,位置向量特征包括句子中的每个词到实体词的相对距离组成,得到的每个词向量在句子中的位置,以实体位置为原点,得到句子中的每个词相对词向量的位置,每个词对两个实体的相对位置组成该词的位置向量特征。
S1123:对词向量和词向量对应的位置向量进行向量级联,得到向量矩阵。
具体地,对词向量和词向量对应的位置向量进行向量级联,得到向量矩阵。
其中,级联(cascade)在指建立多个对象之间的映射关系,建立数据之间的级联关系提高执行或管理效率,本实施例中的向量级联,具体是指建立词向量和词向量对应的位置向量的级联,得到向量矩阵。
S1124:将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息。
具体地,将向量矩阵输入到多层长短时记忆层,通过多层长短时记忆层,提取向量矩阵中包含的上下文语义信息。
其中,长短时记忆层(Long Short-Term Memory,简称LSTM)是一种时间循环神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。
需要说明的是,单向LSTM可以按照人类的阅读顺序从一句话的第一个字记忆到最后一个字,这种LSTM结构只能捕捉到上文信息,无法捕捉到下文信息,而双向LSTM由两个方向不同的LSTM组成,一个LSTM按照句子中词的顺序从前往后读取数据,另一个LSTM从后往前按照句子词序的反方向读取数据,这样第一个LSTM获得上文信息,另一个LSTM获得下文信息,两个LSTM的联合说出就是整个句子的上下文信息,而上下文信息是由整个句子提供的,自然包含比较抽象的语义信息(句子的意思),这种方法的优点是充分利用了LSTM对具有时序特点的序列数据的处理优势,而且由于我们输入了位置特征,其经过双向LSTM编码后可以抽取出位置特征中包含的实体方向信息,其他的方法就没有这样的优点。
因而,作为一种优选方式,本实施例采用双向LSTM来构建长短时记忆层。
S1125:将上下文语义信息输入到全连接层,通过全连接层对上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果。
具体地,将上下文语义信息输入到全连接层,通过全连接层对上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果。
其中,对比结果包括识别正确和识别错误。
S1126:若对比结果不满足预设条件,则通过反向传播的方式,对长短时记忆层的参数进行调整,并返回将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息的步骤继续执行,直到对比结果满足预设条件,将此时得到的初始语义识别模型作为训练好的语义识别模 型。
具体地,在对比结果不满足预设条件时,通过反向传播的方式,对长短时记忆层的参数进行调整,并返回将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息的步骤继续执行,直到对比结果满足预设条件,将此时得到的初始语义识别模型作为训练好的语义识别模型。
其中,预设条件具体可以是预设的迭代次数,例如50次,也可以是识别准确率达到预设准确率阈值,例如,识别准确率超过90%,也可以依据实际情况进行设定,此处不做限制。
在本实施例中,将预设语料库中的每个语料作为一个训练集,并将训练集输入到编码层,通过编码层对训练集进行向量化处理,得到训练集对应的词向量,并通过预设方式,获取每个词向量对应的位置向量,进而对词向量和词向量对应的位置向量进行向量级联,得到向量矩阵,再将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息,将上下文语义信息输入到全连接层,通过全连接层对上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果,在对比结果不满足预设条件时,通过反向传播的方式,对长短时记忆层的参数进行调整,并返回S1124的步骤继续执行,直到对比结果满足预设条件,将此时得到的初始语义识别模型作为训练好的语义识别模型,通过预设语料库中的海量语料与多层长短时记忆层,使得训练好的语义识别模型的识别准确率较高。
在图4对应的实施例的基础之上,下面通过一个具体的实施例来对步骤S16中所提及的对相似问模型进行训练,得到目标相似问模型的实现方法进行详细说明,请参阅图7,具体流程如下:
S161:通过第一输入层将标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵。
具体地,通过第一输入层将标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵。
S162:通过第二输入层将相似问传递到第二编码层,通过第二编码层对相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵。
具体地,通过第二输入层将相似问传递到第二编码层,通过第二编码层对相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵。
需要说明的是,步骤S161与步骤S162之间没有必然的先后顺序关系,其具体也可以是并列执行,此处不作具体限制。
S163:将第一特征矩阵和第二特征矩阵输入到目标全连接层。
具体地,分别将第一transformer层输出的第一特征矩阵和第二transformer层输出的第二特征矩阵均输入到目标全连接层。
S164:在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果。
具体地,在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果,具体过程可参考步骤S1641至步骤S1643的步骤,为避免重复,此处不再赘述。
S165:根据输入的标准问、输入的相似问和相似问的标注信息,计算识别结果的准确率。
具体地,在训练过程中,每个相似问都预设有标注信息,通过输入的标准问、输入的相似问和相似问的标注信息,判断每个识别结果的准确性,进而计算得到的所有识别结果的准确率。
其中,标注信息是指用来标注该相似问对应的标准问,通过该标注信息,可以判断步骤S164中的识别结果是否准确。
S166:若识别结果的准确率小于预设阈值,则通过反向传播的方式,对相似问模型进行迭代训练,直到识别结果的准确率大于或等于预设阈值,将得到的相似问模型作为目标相似问模型。
具体地,若识别结果的准确率小于预设阈值,则通过反向传播和损失函数对相似问模型进行迭代训练,直到识别结果的准确率大于或等于预设阈值,停止训练,将此时得到的相似问模型作为目标相似问模型。
其中,反向传播(Backpropagation algorithm,BP)是一种多层神经元网络的一种学习算法,它建立在梯度下降法的基础上,反向传播算法主要由两个环节(激励传播、权重更新)反复循环迭代,直到网 络的对输入的响应达到预定的目标范围为止。
其中,损失函数包括但不限于:均方误差(Mean-Square Error,MSE)损失函数、合页(Hinge)损失函数、交叉熵(Cross Entropy)损失函数和Smooth L1损失函数等,优选地,本实施例采用的损失函数为交叉熵损失函数。
在本实施例中,通过提取相似问对应的第一特征矩阵和标准问对应的第二特征矩阵,进而对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果确定识别结果,再根据标准问、相似问和相似问的标注信息,计算识别结果的准确率,在识别结果的准确率小于预设阈值时,通过反向创博的方式,对相似问模型进行迭代训练,直到识别结果的准确率大于或等于预设阈值时,将此时得到的相似问模型作为目标相似问模型。实现快速进行相似问模型的训练。
在图7对应的实施例的基础之上,下面通过一个具体的实施例来对步骤S164中所提及的在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果的具体实现方法进行详细说明。
请参阅图8,图8示出了本申请实施例提供的步骤S164的具体实现流程,详述如下:
S1641:将第一特征矩阵与第二特征矩阵进行拼接,得到目标特征矩阵。
具体地,将第一特征矩阵与第二特征矩阵进行拼接,得到目标特征矩阵。
示例性地,第一transformer层输出的第一特征矩阵为a(1*m)和第二transformer层输出的第二特征矩阵为b(1*m),第一特征矩阵a(1*m)和第二特征矩阵b(1*m)进行拼接处理后,得到目标特征矩阵c(1*2m)。
S1642:将目标特征矩阵与目标全连接层的预设参数矩阵进行矩阵相乘处理,得到二维比较向量。
具体地,在目标全连接层,预设有参数矩阵,将目标特征矩阵与目标全连接层的预设参数矩阵进行矩阵相乘处理,得到二维比较向量。
继续以步骤S1642中的示例为例,在一具体实施方式中,目标全连接层的参数矩阵为d(2m*2),将目标特征矩阵c(1*2m)与参数矩阵d(2m*2)进行矩阵相乘得到结果为二维比较向量
Figure PCTCN2019116922-appb-000002
S1643:对二维比较向量中的第一个向量和第二个向量进行数值比较,得到比较结果,并根据比较结果,确定识别结果。
具体地,对二维比较向量中的第一个向量和第二个向量进行数值比较,得到比较结果,在比较结果为第一个向量的值大于第二个向量时,确定识别结果为第二输入层输入的相似问不属于第一层输入层输入的标准问对应的相似问,在比较结果为第一个向量的值小于或等于第二个向量时,确定识别结果为第二输入层输入的相似问属于第一层输入层输入的标准问对应的相似问。
在本实施例中,将第一特征矩阵与第二特征矩阵进行拼接,得到目标特征矩阵,进而将目标特征矩阵与目标全连接层的预设参数矩阵进行矩阵相乘处理,得到二维比较向量,并对二维比较向量中的第一个向量和第二个向量进行数值比较,得到比较结果,并根据比较结果,确定识别结果,实现快速对第一特征矩阵和第二特征矩阵的关联关系进行识别。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
图9示出与上述实施例相似问识别方法一一对应的似问模型训练装置的原理框图。如图9所示,该相似问模型训练装置包括问句获取模块21、问句分组模块22、问句输入模块23、第一特征提取模块24、第二特征提取模块25、特征输入模块26、特征识别模块27和结果确定模块28。各功能模块详细说明如下:
问句获取模块21,用于获取待识别问句;
问句分组模块22,用于从预设的问题库中,获取每个标准问,将待识别问句与每个标准问分别组成一组基础数据;
问句输入模块23,用于将每组基础数据输入到目标相似问模型中,其中,目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
第一特征提取模块24,用于通过第一输入层将基础数据中的标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
第二特征提取模块25,用于通过第二输入层将基础数据中的待识别问句传递到第二编码层,通过第二编码层对待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
特征输入模块26,用于将第一特征矩阵和第二特征矩阵输入到目标全连接层;
特征识别模块27,用于在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定基础数据对应的识别结果;
结果确定模块28,用于根据每组基础数据对应的识别结果,确定待识别问句对应的标准问。
该相似问识别装置还包括:
语义识别模型训练模块,用于获取预设语料库,并使用预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,初始语义识别模型为多层长短时记忆网络,多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数;
面试语料库获取模块,用于从预设的问题库中,获取面试语料库,其中,面试语料库包括标准问和标准问对应的相似问;
第一参数信息获取模块,用于将标准问输入训练好的语义识别模型进行识别,得到编码层输出的第一向量矩阵和长短时记忆层输出的K个第一输出结果,并对第一向量矩阵和K个第一输出结果进行加权汇总,得到第一参数信息;
第二参数信息获取模块,用于将标准问对应的相似问输入训练好的语义识别模型进行识别,得到编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对第二向量矩阵和K个第二输出结果进行加权汇总,得到第二参数信息;
信息输入模块,用于将标准问、标准问对应的相似问、第一参数信息和第二参数信息输入到相似问模型中,其中,相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
相似问模型训练模块,用于使用第一输入层接收标准问,使用第二输入层接收标准问对应的相似问,将第一参数信息作为第一transformer层的初始参数信息,将第二参数信息作为第二transformer层的初始参数信息,并对相似问模型进行训练,得到目标相似问模型。
进一步地,语义识别模型训练模块包括:
第一爬取单元,用于通过网络爬虫的方式,对预设域名进行爬取,得到预设域名对应的页面信息中的统一资源定位符,其中,页面信息中包含至少一个统一资源定位符;
第二爬取单元,用于爬取每个统一资源定位符,得到每个统一资源定位符对应的基本信息;
正则匹配单元,用于采用正则匹配的方式,对每个基本信息进行正则匹配,得到每个基本信息中包含的语料信息;
语料库生成单元,用于对语料信息进行数据清洗,并根据数据清洗后的语料信息生成预设语料库。
进一步地,语义识别模型训练模块还包括:
训练集输入单元,用于将预设语料库中的每个语料作为一个训练集,并将训练集输入到编码层;
向量化处理单元,用于通过编码层对训练集进行向量化处理,得到训练集对应的词向量,并通过预设方式,获取每个词向量对应的位置向量;
向量级联单元,用于对词向量和词向量对应的位置向量进行向量级联,得到向量矩阵;
语义理解单元,用于将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息;
对比单元,用于将上下文语义信息输入到全连接层,通过全连接层对上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果;
循环迭代单元,用于若对比结果不满足预设条件,则通过反向传播的方式,对长短时记忆层的参数进行调整,并返回将向量矩阵输入到长短时记忆层,通过长短时记忆层,提取向量矩阵中包含的上下文语义信息的步骤继续执行,直到对比结果满足预设条件,将此时得到的初始语义识别模型作为训练好的语义识别模型。
进一步地,相似问模型训练模块包括:
第一特征矩阵获取单元,用于通过第一输入层将标准问传递到第一编码层,通过第一编码层对标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
第二特征矩阵获取单元,用于通过第二输入层将相似问传递到第二编码层,通过第二编码层对相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
矩阵输入单元,用于将第一特征矩阵和第二特征矩阵输入到目标全连接层;
变换处理单元,用于在目标全连接层,对第一特征矩阵和第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果;
迭代训练单元,用于根据输入的标准问、输入的相似问和相似问的标注信息,计算识别结果的准确率;若识别结果的准确率小于预设阈值,则通过反向传播的方式,对相似问模型进行迭代训练,直到识别结果的准确率大于或等于预设阈值,将得到的相似问模型作为目标相似问模型。
进一步地,变换处理单元包括:
矩阵拼接子单元,用于将第一特征矩阵与第二特征矩阵进行拼接,得到目标特征矩阵;
矩阵相乘子单元,用于将目标特征矩阵与目标全连接层的预设参数矩阵进行矩阵相乘处理,得到二维比较向量;
结果确定子单元,用于对二维比较向量中的第一个向量和第二个向量进行数值比较,得到比较结果,并根据比较结果,确定识别结果。
关于相似问模型训练装置的具体限定可以参见上文中对于相似问识别方法的限定,关于相似问识别装置的具体限定可以参见上文中对于相似问识别方法的限定,在此不再赘述。上述相似问模型训练装置和相似问识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
图10是本申请一实施例提供的计算机设备的示意图。该计算机设备可以是服务端,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储初始语义识别模型和预设的问题库。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现上述相似问识别方法。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中相似问识别方法的步骤,例如图2所示的步骤S21至步骤S28,或者,处理器执行计算机可读指令时实现上述实施例中相似问识别装置的各模块/单元的功能,例如图9所示的模块21至模块28的功能。为避免重复,这里不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
在一实施例中,提供一计算机非易失性可读存储介质,该计算机非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述实施例相似问识别方法的步骤,或者,该计算机可读指令被处理器执行时实现上述实施例相似问识别装置中各模块/单元的功能。为避免重复,这里不再赘述。
可以理解地,所述计算机非易失性可读存储介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号和电信信号等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种相似问识别方法,其特征在于,包括:
    获取待识别问句;
    从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;
    将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;
    根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
  2. 根据权利要求1所述的相似问识别方法,其特征在于,在所述将每组所述基础数据输入到目标相似问模型中之前,所述相似问识别方法还包括:
    获取预设语料库,并使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,所述初始语义识别模型为多层长短时记忆网络,所述多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数;
    从预设的问题库中,获取面试语料库,其中,所述面试语料库包括标准问和所述标准问对应的相似问;
    将所述标准问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第一向量矩阵和所述长短时记忆层输出的K个第一输出结果,并对所述第一向量矩阵和K个所述第一输出结果进行加权汇总,得到第一参数信息;
    将所述标准问对应的相似问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对所述第二向量矩阵和K个所述第二输出结果进行加权汇总,得到第二参数信息;
    将所述标准问、所述标准问对应的相似问、所述第一参数信息和所述第二参数信息输入到相似问模型中,其中,所述相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    使用所述第一输入层接收所述标准问,使用所述第二输入层接收所述标准问对应的相似问,将第一参数信息作为所述第一transformer层的初始参数信息,将所述第二参数信息作为所述第二transformer层的初始参数信息,并对所述相似问模型进行训练,得到所述目标相似问模型。
  3. 如权利要求2所述的相似问识别方法,其特征在于,所述获取预设语料库包括:
    通过网络爬虫的方式,对预设域名进行爬取,得到所述预设域名对应的页面信息中的统一资源定位符,其中,所述页面信息中包含至少一个所述统一资源定位符;
    爬取每个所述统一资源定位符,得到每个所述统一资源定位符对应的基本信息;
    采用正则匹配的方式,对每个所述基本信息进行正则匹配,得到每个所述基本信息中包含的语料信息;
    对所述语料信息进行数据清洗,并根据数据清洗后的语料信息生成所述预设语料库。
  4. 如权利要求2所述的相似问识别方法,其特征在于,使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型包括:
    将所述预设语料库中的每个语料作为一个训练集,并将所述训练集输入到所述编码层;
    通过所述编码层对所述训练集进行向量化处理,得到所述训练集对应的词向量,并通过预设方式,获取每个所述词向量对应的位置向量;
    对所述词向量和所述词向量对应的位置向量进行向量级联,得到向量矩阵;
    将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息;
    将所述上下文语义信息输入到全连接层,通过所述全连接层对所述上下文语义信息进行识别,得到 识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果;
    若所述对比结果不满足预设条件,则通过反向传播的方式,对所述长短时记忆层的参数进行调整,并返回所述将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息的步骤继续执行,直到所述对比结果满足预设条件,将此时得到的初始语义识别模型作为所述训练好的语义识别模型。
  5. 如权利要求2所述的相似问识别方法,其特征在于,所述输入到第二输入层的相似问包含标注信息,所述对所述相似问模型进行训练,得到目标相似问模型包括:
    通过所述第一输入层将所述标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述相似问传递到所述第二编码层,通过所述第二编码层对所述相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果;
    根据输入的所述标准问、输入的所述相似问和所述相似问的标注信息,计算所述识别结果的准确率;若所述识别结果的准确率小于预设阈值,则通过反向传播的方式,对所述相似问模型进行迭代训练,直到所述识别结果的准确率大于或等于预设阈值,将得到的相似问模型作为所述目标相似问模型。
  6. 如权利要求5所述的相似问识别方法,其特征在于,所述在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,并根据变换处理的结果,确定识别结果包括:
    将所述第一特征矩阵与所述第二特征矩阵进行拼接,得到目标特征矩阵;
    将所述目标特征矩阵与所述目标全连接层的预设参数矩阵进行矩阵相乘处理,得到二维比较向量;
    对所述二维比较向量中的第一个向量和第二个向量进行数值比较,得到比较结果,并根据所述比较结果,确定识别结果。
  7. 一种相似问识别装置,其特征在于,包括:
    问句获取模块,用于获取待识别问句;
    问句分组模块,用于从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;
    问句输入模块,用于将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    第一特征提取模块,用于通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    第二特征提取模块,用于通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    特征输入模块,用于将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    特征识别模块,用于在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;
    结果确定模块,用于根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
  8. 如权利要求7所述的相似问识别装置,其特征在于,所述相似问识别装置还包括:
    语义识别模型训练模块,用于获取预设语料库,并使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,所述初始语义识别模型为多层长短时记忆网络,所述多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数;
    面试语料库获取模块,用于从预设的问题库中,获取面试语料库,其中,所述面试语料库包括标准问和所述标准问对应的相似问;
    第一参数信息获取模块,用于将所述标准问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第一向量矩阵和所述长短时记忆层输出的K个第一输出结果,并对所述第一向量矩阵和K个所述第一输出结果进行加权汇总,得到第一参数信息;
    第二参数信息获取模块,用于将所述标准问对应的相似问输入所述训练好的语义识别模型进行识 别,得到所述编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对所述第二向量矩阵和K个所述第二输出结果进行加权汇总,得到第二参数信息;
    信息输入模块,用于将所述标准问、所述标准问对应的相似问、所述第一参数信息和所述第二参数信息输入到相似问模型中,其中,所述相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    相似问模型训练模块,用于使用所述第一输入层接收所述标准问,使用所述第二输入层接收所述标准问对应的相似问,将第一参数信息作为所述第一transformer层的初始参数信息,将所述第二参数信息作为所述第二transformer层的初始参数信息,并对所述相似问模型进行训练,得到目标相似问模型。
  9. 如权利要求8所述的相似问识别装置,其特征在于,所述语义识别模型训练模块包括:
    第一爬取单元,用于通过网络爬虫的方式,对预设域名进行爬取,得到所述预设域名对应的页面信息中的统一资源定位符,其中,所述页面信息中包含至少一个所述统一资源定位符;
    第二爬取单元,用于爬取每个所述统一资源定位符,得到每个所述统一资源定位符对应的基本信息;
    正则匹配单元,用于采用正则匹配的方式,对每个所述基本信息进行正则匹配,得到每个所述基本信息中包含的语料信息;
    语料库生成单元,用于对所述语料信息进行数据清洗,并根据数据清洗后的语料信息生成所述预设语料库。
  10. 如权利要求8所述的相似问识别装置,其特征在于,所述语义识别模型训练模块包括:
    训练集输入单元,用于将所述预设语料库中的每个语料作为一个训练集,并将所述训练集输入到所述编码层;
    向量化处理单元,用于通过所述编码层对所述训练集进行向量化处理,得到所述训练集对应的词向量,并通过预设方式,获取每个所述词向量对应的位置向量;
    向量级联单元,用于对所述词向量和所述词向量对应的位置向量进行向量级联,得到向量矩阵;
    语义理解单元,用于将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息;
    对比单元,用于将所述上下文语义信息输入到全连接层,通过所述全连接层对所述上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果;
    循环迭代单元,用于若所述对比结果不满足预设条件,则通过反向传播的方式,对所述长短时记忆层的参数进行调整,并返回所述将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息的步骤继续执行,直到所述对比结果满足预设条件,将此时得到的初始语义识别模型作为所述训练好的语义识别模型。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下相似问识别方法的步骤:
    获取待识别问句;
    从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;
    将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;
    根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
  12. 根据权利要求11所述的计算机设备,其特征在于,在所述将每组所述基础数据输入到目标相似问模型中之前,所述处理器执行所述计算机可读指令时还实现如下相似问识别方法的步骤:
    获取预设语料库,并使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,所述初始语义识别模型为多层长短时记忆网络,所述多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数;
    从预设的问题库中,获取面试语料库,其中,所述面试语料库包括标准问和所述标准问对应的相似 问;
    将所述标准问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第一向量矩阵和所述长短时记忆层输出的K个第一输出结果,并对所述第一向量矩阵和K个所述第一输出结果进行加权汇总,得到第一参数信息;
    将所述标准问对应的相似问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对所述第二向量矩阵和K个所述第二输出结果进行加权汇总,得到第二参数信息;
    将所述标准问、所述标准问对应的相似问、所述第一参数信息和所述第二参数信息输入到相似问模型中,其中,所述相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    使用所述第一输入层接收所述标准问,使用所述第二输入层接收所述标准问对应的相似问,将第一参数信息作为所述第一transformer层的初始参数信息,将所述第二参数信息作为所述第二transformer层的初始参数信息,并对所述相似问模型进行训练,得到所述目标相似问模型。
  13. 如权利要求12所述的计算机设备,其特征在于,所述获取预设语料库包括:
    通过网络爬虫的方式,对预设域名进行爬取,得到所述预设域名对应的页面信息中的统一资源定位符,其中,所述页面信息中包含至少一个所述统一资源定位符;
    爬取每个所述统一资源定位符,得到每个所述统一资源定位符对应的基本信息;
    采用正则匹配的方式,对每个所述基本信息进行正则匹配,得到每个所述基本信息中包含的语料信息;
    对所述语料信息进行数据清洗,并根据数据清洗后的语料信息生成所述预设语料库。
  14. 如权利要求12所述的计算机设备,其特征在于,使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型包括:
    将所述预设语料库中的每个语料作为一个训练集,并将所述训练集输入到所述编码层;
    通过所述编码层对所述训练集进行向量化处理,得到所述训练集对应的词向量,并通过预设方式,获取每个所述词向量对应的位置向量;
    对所述词向量和所述词向量对应的位置向量进行向量级联,得到向量矩阵;
    将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息;
    将所述上下文语义信息输入到全连接层,通过所述全连接层对所述上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果;
    若所述对比结果不满足预设条件,则通过反向传播的方式,对所述长短时记忆层的参数进行调整,并返回所述将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息的步骤继续执行,直到所述对比结果满足预设条件,将此时得到的初始语义识别模型作为所述训练好的语义识别模型。
  15. 如权利要求12所述的计算机设备,其特征在于,所述输入到第二输入层的相似问包含标注信息,所述对所述相似问模型进行训练,得到目标相似问模型包括:
    通过所述第一输入层将所述标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述相似问传递到所述第二编码层,通过所述第二编码层对所述相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果;
    根据输入的所述标准问、输入的所述相似问和所述相似问的标注信息,计算所述识别结果的准确率;若所述识别结果的准确率小于预设阈值,则通过反向传播的方式,对所述相似问模型进行迭代训练,直到所述识别结果的准确率大于或等于预设阈值,将得到的相似问模型作为所述目标相似问模型。
  16. 一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下相似问识别方法的步骤:
    获取待识别问句;
    从预设的问题库中,获取每个标准问,将所述待识别问句与每个所述标准问分别组成一组基础数据;
    将每组所述基础数据输入到目标相似问模型中,其中,所述目标相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    通过所述第一输入层将所述基础数据中的标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述基础数据中的待识别问句传递到所述第二编码层,通过所述第二编码层对所述待识别问句进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定所述基础数据对应的识别结果;
    根据每组基础数据对应的所述识别结果,确定所述待识别问句对应的标准问。
  17. 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,在所述将每组所述基础数据输入到目标相似问模型中之前,所述计算机可读指令被处理器执行时实现如下相似问识别方法的步骤:
    获取预设语料库,并使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型,其中,所述初始语义识别模型为多层长短时记忆网络,所述多层长短时记忆网络包括编码层、K个长短时记忆层和全连接层,K为大于1的正整数;
    从预设的问题库中,获取面试语料库,其中,所述面试语料库包括标准问和所述标准问对应的相似问;
    将所述标准问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第一向量矩阵和所述长短时记忆层输出的K个第一输出结果,并对所述第一向量矩阵和K个所述第一输出结果进行加权汇总,得到第一参数信息;
    将所述标准问对应的相似问输入所述训练好的语义识别模型进行识别,得到所述编码层输出的第二向量矩阵和长短时记忆层输出的K个第二输出结果,并对所述第二向量矩阵和K个所述第二输出结果进行加权汇总,得到第二参数信息;
    将所述标准问、所述标准问对应的相似问、所述第一参数信息和所述第二参数信息输入到相似问模型中,其中,所述相似问模型包括第一输入层、第二输入层、第一编码层、第二编码层、第一transformer层、第二transformer层和目标全连接层;
    使用所述第一输入层接收所述标准问,使用所述第二输入层接收所述标准问对应的相似问,将第一参数信息作为所述第一transformer层的初始参数信息,将所述第二参数信息作为所述第二transformer层的初始参数信息,并对所述相似问模型进行训练,得到所述目标相似问模型。
  18. 如权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述获取预设语料库包括:
    通过网络爬虫的方式,对预设域名进行爬取,得到所述预设域名对应的页面信息中的统一资源定位符,其中,所述页面信息中包含至少一个所述统一资源定位符;
    爬取每个所述统一资源定位符,得到每个所述统一资源定位符对应的基本信息;
    采用正则匹配的方式,对每个所述基本信息进行正则匹配,得到每个所述基本信息中包含的语料信息;
    对所述语料信息进行数据清洗,并根据数据清洗后的语料信息生成所述预设语料库。
  19. 如权利要求17所述的计算机非易失性可读存储介质,其特征在于,使用所述预设语料库,对初始语义识别模型进行训练,得到训练好的语义识别模型包括:
    将所述预设语料库中的每个语料作为一个训练集,并将所述训练集输入到所述编码层;
    通过所述编码层对所述训练集进行向量化处理,得到所述训练集对应的词向量,并通过预设方式,获取每个所述词向量对应的位置向量;
    对所述词向量和所述词向量对应的位置向量进行向量级联,得到向量矩阵;
    将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息;
    将所述上下文语义信息输入到全连接层,通过所述全连接层对所述上下文语义信息进行识别,得到识别语义信息,并将识别语义信息与预设标注信息进行对比,得到对比结果;
    若所述对比结果不满足预设条件,则通过反向传播的方式,对所述长短时记忆层的参数进行调整,并返回所述将所述向量矩阵输入到所述长短时记忆层,通过所述长短时记忆层,提取所述向量矩阵中包含的上下文语义信息的步骤继续执行,直到所述对比结果满足预设条件,将此时得到的初始语义识别模 型作为所述训练好的语义识别模型。
  20. 如权利要求17所述的计算机非易失性可读存储介质,其特征在于,所述输入到第二输入层的相似问包含标注信息,所述对所述相似问模型进行训练,得到目标相似问模型包括:
    通过所述第一输入层将所述标准问传递到所述第一编码层,通过所述第一编码层对所述标准问进行向量矩阵提取,并将提取到的向量矩阵输入到第一transformer层,采用所述第一transformer层对提取到的向量矩阵进行特征提取,得到第一特征矩阵;
    通过所述第二输入层将所述相似问传递到所述第二编码层,通过所述第二编码层对所述相似问进行向量矩阵提取,并将提取到的向量矩阵输入到第二transformer层,采用所述第二transformer层对提取到的向量矩阵进行特征提取,得到第二特征矩阵;
    将所述第一特征矩阵和所述第二特征矩阵输入到所述目标全连接层;
    在所述目标全连接层,对所述第一特征矩阵和所述第二特征矩阵进行变换处理,得到变换结果,并根据变换结果,确定识别结果;
    根据输入的所述标准问、输入的所述相似问和所述相似问的标注信息,计算所述识别结果的准确率;若所述识别结果的准确率小于预设阈值,则通过反向传播的方式,对所述相似问模型进行迭代训练,直到所述识别结果的准确率大于或等于预设阈值,将得到的相似问模型作为所述目标相似问模型。
PCT/CN2019/116922 2019-09-24 2019-11-10 相似问识别方法、装置、计算机设备及存储介质 WO2021056709A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910905566.1 2019-09-24
CN201910905566.1A CN110837738B (zh) 2019-09-24 2019-09-24 相似问识别方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021056709A1 true WO2021056709A1 (zh) 2021-04-01

Family

ID=69574576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116922 WO2021056709A1 (zh) 2019-09-24 2019-11-10 相似问识别方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110837738B (zh)
WO (1) WO2021056709A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704411A (zh) * 2021-08-31 2021-11-26 平安银行股份有限公司 基于词向量的相似客群挖掘方法、装置、设备及存储介质
CN114416927A (zh) * 2022-01-24 2022-04-29 招商银行股份有限公司 智能问答方法、装置、设备及存储介质
CN117011690A (zh) * 2023-10-07 2023-11-07 广东电网有限责任公司阳江供电局 一种海缆隐患识别方法、装置、设备和介质
CN114595697B (zh) * 2022-03-14 2024-04-05 京东科技信息技术有限公司 用于生成预标注样本的方法、装置、服务器和介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111666755A (zh) * 2020-06-24 2020-09-15 深圳前海微众银行股份有限公司 一种复述句识别的方法及装置
CN113378902B (zh) * 2021-05-31 2024-02-23 深圳神目信息技术有限公司 一种基于优化视频特征的视频抄袭检测方法
CN113850078B (zh) * 2021-09-29 2024-06-18 平安科技(深圳)有限公司 基于机器学习的多意图识别方法、设备及可读存储介质
CN114818693A (zh) * 2022-03-28 2022-07-29 平安科技(深圳)有限公司 一种语料匹配的方法、装置、计算机设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100191521A1 (en) * 2005-06-14 2010-07-29 Colloquis, Inc. Methods and apparatus for evaluating semantic proximity
CN109902296A (zh) * 2019-01-18 2019-06-18 华为技术有限公司 自然语言处理方法、训练方法及数据处理设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5825676B2 (ja) * 2012-02-23 2015-12-02 国立研究開発法人情報通信研究機構 ノン・ファクトイド型質問応答システム及びコンピュータプログラム
CN109933779A (zh) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 用户意图识别方法及系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100191521A1 (en) * 2005-06-14 2010-07-29 Colloquis, Inc. Methods and apparatus for evaluating semantic proximity
CN109902296A (zh) * 2019-01-18 2019-06-18 华为技术有限公司 自然语言处理方法、训练方法及数据处理设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MUTHURAMAN CHIDAMBARAM, YINFEI YANG, DANIEL CER, STEVE YUAN, YUN-HSUAN SUNG, BRIAN STROPE, RAY KURZWEIL: "Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model", ARXIV.ORG, 30 October 2019 (2019-10-30), pages 1 - 10, XP081453215 *
SAMUEL HUMEAU , KURT SHUSTER , MARIE-ANNE LACHAUX , JASON WESTON: "Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers", ARXIV, 22 April 2019 (2019-04-22), pages 1 - 10, XP081269391 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704411A (zh) * 2021-08-31 2021-11-26 平安银行股份有限公司 基于词向量的相似客群挖掘方法、装置、设备及存储介质
CN113704411B (zh) * 2021-08-31 2023-09-15 平安银行股份有限公司 基于词向量的相似客群挖掘方法、装置、设备及存储介质
CN114416927A (zh) * 2022-01-24 2022-04-29 招商银行股份有限公司 智能问答方法、装置、设备及存储介质
CN114416927B (zh) * 2022-01-24 2024-04-02 招商银行股份有限公司 智能问答方法、装置、设备及存储介质
CN114595697B (zh) * 2022-03-14 2024-04-05 京东科技信息技术有限公司 用于生成预标注样本的方法、装置、服务器和介质
CN117011690A (zh) * 2023-10-07 2023-11-07 广东电网有限责任公司阳江供电局 一种海缆隐患识别方法、装置、设备和介质
CN117011690B (zh) * 2023-10-07 2024-02-09 广东电网有限责任公司阳江供电局 一种海缆隐患识别方法、装置、设备和介质

Also Published As

Publication number Publication date
CN110837738B (zh) 2023-06-30
CN110837738A (zh) 2020-02-25

Similar Documents

Publication Publication Date Title
WO2021056709A1 (zh) 相似问识别方法、装置、计算机设备及存储介质
CN109033068B (zh) 基于注意力机制的用于阅读理解的方法、装置和电子设备
CN108694225B (zh) 一种图像搜索方法、特征向量的生成方法、装置及电子设备
WO2017092380A1 (zh) 用于人机对话的方法、神经网络系统和用户设备
CN109992773B (zh) 基于多任务学习的词向量训练方法、系统、设备及介质
WO2019080864A1 (zh) 一种文本语义编码方法及装置
CN110457718B (zh) 一种文本生成方法、装置、计算机设备及存储介质
WO2022227162A1 (zh) 问答数据处理方法、装置、计算机设备及存储介质
AU2020104254A4 (en) Healthcare question answering (qa) method and system based on contextualized language model and knowledge embedding
CN111191002A (zh) 一种基于分层嵌入的神经代码搜索方法及装置
CN113704460B (zh) 一种文本分类方法、装置、电子设备和存储介质
US11423093B2 (en) Inter-document attention mechanism
CN112580328A (zh) 事件信息的抽取方法及装置、存储介质、电子设备
CN109522561B (zh) 一种问句复述识别方法、装置、设备及可读存储介质
CN112084789A (zh) 文本处理方法、装置、设备及存储介质
JP2023022845A (ja) ビデオ処理方法、ビデオサーチ方法及びモデルトレーニング方法、装置、電子機器、記憶媒体及びコンピュータプログラム
Lu et al. Sentence semantic matching based on 3D CNN for human–robot language interaction
CN113204611A (zh) 建立阅读理解模型的方法、阅读理解方法及对应装置
CN113158687B (zh) 语义的消歧方法及装置、存储介质、电子装置
CN114020906A (zh) 基于孪生神经网络的中文医疗文本信息匹配方法及系统
CN112528136A (zh) 一种观点标签的生成方法、装置、电子设备和存储介质
JP2024512628A (ja) キャプション生成器を生成するための方法および装置、並びにキャプションを出力するための方法および装置
CN114492451B (zh) 文本匹配方法、装置、电子设备及计算机可读存储介质
Wang et al. Weighted graph convolution over dependency trees for nontaxonomic relation extraction on public opinion information
CN112307738B (zh) 用于处理文本的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946720

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946720

Country of ref document: EP

Kind code of ref document: A1