CN114925185B

CN114925185B - Interaction method, model training method, device, equipment and medium

Info

Publication number: CN114925185B
Application number: CN202210531471.XA
Authority: CN
Inventors: 吴高升; 田鑫; 程军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2023-02-07
Anticipated expiration: 2042-05-13
Also published as: CN114925185A

Abstract

The disclosure provides an interaction method, a training method and device of a deep learning model, electronic equipment, a storage medium and a program product, and relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, intelligent search, NLP and the like. The specific implementation scheme is as follows: training a deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and training the first-stage model by using the sample sentence pairs to obtain a first-stage target model, wherein the sample sentence pairs comprise two sample sentences with the same semantics.

Description

Interaction method, model training method, device, equipment and medium

Technical Field

The present disclosure relates to the technical field of artificial intelligence, and in particular, to the technical fields of deep learning, intelligent search, NLP (Natural Language Processing), and the like, and in particular, to an interaction method, a deep learning model training method, an apparatus, an electronic device, a storage medium, and a program product.

Background

Human-computer interaction is a technology in which humans interact with machines using natural language. With the continuous development of artificial intelligence technology, it has been realized to enable machines to understand the natural language output by humans, understand the inherent meaning in the natural language, and make corresponding feedback. In these operations, accurate understanding of semantics, rapidity of feedback, and giving corresponding opinions or suggestions become factors that affect the smoothness of human-computer interaction.

Disclosure of Invention

The disclosure provides an interaction method, a training method and device of a deep learning model, electronic equipment, a storage medium and a program product.

According to an aspect of the present disclosure, there is provided a training method of a deep learning model, including: training a deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and training the first-stage model by using a sample sentence pair to obtain a first-stage target model, wherein the sample sentence pair comprises two sample sentences with the same semantics.

According to another aspect of the present disclosure, there is provided an interaction method including: receiving a question from a user; inputting the problems into a feature extraction model to obtain a semantic vector; determining a target index vector matched with the semantic vector from a plurality of semantic index vectors; and determining answers matched with the questions based on the target index vectors, wherein the feature extraction model is obtained by training through the deep learning model training method.

According to another aspect of the present disclosure, there is provided a training apparatus for a deep learning model, including: the first training module is used for training the deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and the second training module is used for training the first-stage model by using a sample sentence pair to obtain a first-stage target model, wherein the sample sentence pair comprises two sample sentences with the same semantics.

According to another aspect of the present disclosure, there is provided an interaction apparatus including: a receiving module for receiving a question from a user; the extraction module is used for inputting the problems into the feature extraction model to obtain semantic vectors; the matching module is used for determining a target index vector matched with the semantic vector from a plurality of semantic index vectors; and the answer determining module is used for determining an answer matched with the question based on the target index vector, wherein the feature extraction model is obtained by utilizing the training device of the deep learning model disclosed by the disclosure for training.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method according to the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method as disclosed herein.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which the interaction method and apparatus may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of training a deep learning model according to an embodiment of the disclosure;

FIG. 3 schematically shows a flow diagram of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 4 schematically shows a flowchart of a training method of a deep learning model according to another embodiment of the present disclosure;

FIG. 5 schematically shows a flow chart of an interaction method according to an embodiment of the present disclosure;

FIG. 6 schematically shows a flow diagram of an interaction method according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a training apparatus for deep learning models, in accordance with an embodiment of the present disclosure;

FIG. 8 schematically shows a block diagram of an interaction device according to an embodiment of the present disclosure; and

FIG. 9 schematically illustrates a block diagram of an electronic device adapted to implement a training method of a deep learning model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to an embodiment of the present disclosure, there is provided a training method of a deep learning model, including: training a deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and training the first-stage model by using the sample sentence pairs to obtain a first-stage target model, wherein the sample sentence pairs comprise two sample sentences with the same semantics.

According to an embodiment of the present disclosure, there is provided an interaction method including: receiving a question from a user; inputting the problem into a feature extraction model to obtain a semantic vector; determining a target index vector matched with the semantic vector from the plurality of semantic index vectors; and determining answers matched with the questions based on the target index vectors, wherein the feature extraction model is obtained by training through the deep learning model training method.

According to the embodiment of the disclosure, the interaction method provided by the embodiment of the disclosure can be applied to a question-answering system. The question-answering system is widely applied to scenes such as a search engine, an intelligent customer service and an intelligent assistant. By utilizing the interaction method provided by the embodiment of the disclosure, the questions put forward by the user in the natural language can be answered accurately and simply by using the natural language. For example, governments often develop policies that are rich and widely related, and government workers often need to ask public confusion in order to facilitate users to quickly and accurately understand the policy contents. By utilizing the interaction method provided by the embodiment of the disclosure, a question-answering system for solving the relevant policy information can be constructed, so that the working efficiency of government workers can be improved, and the question-answering system can serve the masses in non-government working time periods. In addition, the interaction method provided by the embodiment of the disclosure is not limited to be applied to the question-answer interaction in the government field, and can also be applied to the question-answer interaction in the fields of telecommunication, insurance, school affairs and the like.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

Fig. 1 schematically shows an exemplary system architecture to which the interaction method and apparatus may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the interaction method and apparatus may be applied may include a terminal device, but the terminal device may implement the interaction method and apparatus provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for questions entered by the user using the

terminal devices

101, 102, 103. The background management server may analyze and otherwise process the received user question, and feed back a processing result (e.g., a web page, information, or data obtained or generated according to a user request) to the terminal device.

It should be noted that the interaction method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

101, 102, or 103. Correspondingly, the interaction device provided by the embodiment of the present disclosure may also be disposed in the

terminal equipment

101, 102, or 103.

Alternatively, the interaction method provided by the embodiments of the present disclosure may also be generally performed by the server 105. Accordingly, the interaction means provided by the embodiments of the present disclosure may be generally disposed in the server 105. The interaction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Correspondingly, the interaction device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

For example, when a user inputs a question online, the

terminal devices

101, 102, and 103 may acquire the question, send the acquired question to the server 105, and perform feature extraction on the question by using the feature extraction model by the server 105 to obtain a semantic vector; and determining a target index vector matched with the semantic vector from the plurality of semantic index vectors. Based on the target index vector, an answer that matches the question is determined. Or by a server or server cluster capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105, and finally determine answers matching the questions.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

Fig. 2 schematically shows a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S220.

In operation S210, a deep learning model is trained by using an unsupervised comparative learning training mode using an unlabeled sample sentence, so as to obtain a first-level model.

In operation S220, a first-level model is trained using the sample sentence pairs to obtain a first-level target model. The sample sentence pair includes two sample sentences of the same semantics.

According to an embodiment of the present disclosure, the network structure of the deep learning model is not limited, for example, the deep learning model may include a transform (codec) network, such as a bi-directional language network model, a uni-directional language network model, and an end-to-end language network model. The deep learning model may also be a model trained by using open source data, for example, one of the coding networks in the pre-trained model rockqa may be used as the deep learning model to be trained according to the embodiment of the present disclosure. By using the pre-training model as the deep learning model of the embodiment of the present disclosure, the difference between the training method provided by the embodiment of the present disclosure and the training method that simply adopts supervision can be reduced.

According to the embodiment of the disclosure, the deep learning model is trained step by adopting the unmarked sample sentences and the sample sentence pairs, so that the deep learning model can learn the semantic features of the sentence levels in the sample sentences, the first-level target model can be used for extracting the semantic features in the sentences to obtain the semantic feature vectors, and the first-level target model can be applied to the interactive scene of the retrieval type question-answering.

According to the embodiment of the disclosure, a deep learning model can be trained in an unsupervised comparative learning training mode by using unlabeled sample sentences to obtain a first-stage model, and the first-stage model is trained in a supervised training mode by using sample sentence pairs to obtain a first-stage target model. Therefore, the unsupervised training mode and the supervised training mode are combined, the operation of manually marking data can be omitted through the unsupervised training mode, the marking cost is reduced, and meanwhile, the training precision of the model can be improved through the supervised training mode.

According to the embodiment of the disclosure, for operation S210, training the deep learning model by using an unsupervised contrast learning training mode using an unlabeled sample sentence to obtain a first-level model, which may include the following operations.

For example, the hidden layer in the deep learning model is subjected to node random inactivation treatment to obtain the deep learning model with the random inactivation function. And inputting the non-labeled sample sentence into a deep learning model with a random inactivation function twice to obtain a positive sample feature vector pair. And training the deep learning model by using the positive sample feature vector pair to obtain a first-stage model.

According to the embodiment of the disclosure, node random inactivation treatment can be performed on the hidden layer in the deep learning model according to the preset node retention probability, so that the deep learning model with the random inactivation function is obtained.

According to an embodiment of the present disclosure, a node random deactivation (dropout) process may refer to: the method for optimizing the artificial neural network with the deep structure reduces the interdependence between nodes by randomly zeroing partial weight or output of the hidden layer in the training process, thereby realizing the regularization of the neural network. An unsupervised SimCSE (Simple contrast Learning of sequence entries) method can be adopted to input the unlabeled sample Sentence into the deep Learning model with the random inactivation function twice, so as to obtain a positive sample feature vector pair. The pair of positive sample feature vectors includes two feature vectors that are used to characterize two sentences that have the same semantics but not exactly the same content. Training the deep learning model by using the positive sample feature vector pair, and obtaining the first-stage model may include: and inputting the positive sample feature vector pair into an SimCSE unsupervised loss function to obtain a loss value, adjusting parameters of the deep learning model based on the loss value until the loss value is converged, and taking the model when the loss value is converged as a first-stage model.

By utilizing the unsupervised comparison learning training mode training model provided by the embodiment of the disclosure, the unsupervised training can be directly carried out on the deep learning model by using the unlabeled sample sentences, so that the manual labeling operation is omitted.

According to the embodiment of the disclosure, the deep learning model is trained in an unsupervised contrast learning training mode by using the unlabeled sample sentence to obtain the first-stage model, and the method can further comprise an adjustment processing operation.

For example, the sentence length of the initial unlabeled sample sentence is determined. And under the condition that the sentence length is determined to meet the preset processing condition, adjusting the initial non-annotated sample sentence to obtain the non-annotated sample sentence.

According to an embodiment of the present disclosure, the predetermined processing conditions may include a first predetermined processing condition and a second predetermined processing condition.

For example, the statement length satisfying the first predetermined processing condition is used to characterize that the statement length is smaller than the predetermined length threshold, and when it is determined that the statement length satisfies the first predetermined processing condition, the initial sample statement is subjected to lengthening adjustment processing to obtain the sample statement. The lengthening adjustment process may include at least one of: random insertion processing, random repeated word processing, random repeated entity processing, synonymous entity replacement processing and synonymous word replacement processing.

For example, the statement length satisfying the second predetermined processing condition is used for characterizing that the statement length is greater than the predetermined length threshold, and when the statement length is determined to satisfy the second predetermined processing condition, the initial sample statement is shortened and adjusted to obtain the sample statement. The shortening adjustment process may include at least one of: random deletion processing, synonym entity replacement processing and synonym replacement processing.

By utilizing the adjusting processing operation provided by the embodiment of the disclosure, the lengths of the unlabeled sample sentences can be kept consistent, and the model training capability and the model training efficiency are improved by combining with an unsupervised comparison learning training mode.

According to an embodiment of the present disclosure, for operation S220, training the first-stage model using the sample sentence pairs to obtain the first-stage target model may include the following operations.

For example, a first-level model is trained using a first-level sample statement pair, resulting in a second-level model. And training a second-stage model by using the second-stage sample statement pair to obtain a first-stage target model.

According to an embodiment of the present disclosure, the sample statement pair may include a first-level sample statement pair and a second-level sample statement pair. The determined difficulty level of the semantic similarity between the second level sample sentence pairs is higher than the determined difficulty level of the semantic similarity between the first level sample sentence pairs. For example, the second level sample statement pair may include two second sample statements, and the first level sample statement pair may include two first sample statements. The determined difficulty level of the semantic similarity between the two second sample sentences is higher than the determined difficulty level of the semantic similarity between the two first sample sentences.

According to the embodiment of the disclosure, a first-stage sample statement pair with low semantic similarity determination difficulty level is used for training a first-stage model, then a second-stage sample statement pair with high semantic similarity determination difficulty level is used for training a second-stage model, and along with the fact that the higher the semantic similarity determination difficulty level is, the higher the training precision of the model is, course-type training is formed. The unsupervised training mode and the supervised curriculum-type training mode are combined, so that the training precision of the model is improved, and the training efficiency is improved.

According to an embodiment of the present disclosure, the sample sentence pair may include a synonym replacement sample sentence pair and a synonym entity replacement sample sentence pair. For example, a first level sample statement pair may be a synonym replacement sample statement pair, and a second level sample statement pair may be a synonymous entity replacement sample statement pair. The synonym replacement sample sentence pair may mean that at least one word in the two sample sentences is different, and the different word is the synonym. The synonym entity replacing the sample sentence pair can mean that at least one entity in the two sample sentences is different, and the different entities are the synonym entities.

For example, a synonym table may be queried, synonym replacement may be performed on part of terms to be replaced in a first sample sentence, and another first sample sentence is augmented and constructed to obtain a synonym replacement sample sentence pair. In addition, the number of replacement of the words to be replaced can be limited, for example, the number of the words to be replaced is not larger than a replacement threshold value, for example, 2, so that the semantic drift problem after replacement of the synonym is reduced.

For example, a part of entities to be replaced in the second sample sentence may be subjected to synonymy entity replacement, and another second sample sentence is constructed in an augmented manner, so as to obtain a synonymy entity replacement sample sentence pair. Synonymous entities may include, for example, company names, organization names, place names, item names, and the like. The named entity recognition model can be used to recognize the entity in the second sample sentence, and then the named entity table is used to query the alternative name, the English name, the alternative name, and the like, so as to replace the named entity with the synonymous entity to obtain another second sample sentence. For example, a potato may be referred to as "potato, sweet potato, yam egg, yam bean", etc., and these entities may be supplemented in advance into a named entity table.

According to the embodiment of the disclosure, the synonym substitution sample statement pair is used as the first-level sample statement pair, and the synonym entity substitution sample statement pair is used as the second-level sample statement pair, so that the manual labeling operation can be omitted, the capability of training data construction is improved, and the quality of training data is improved. For example, synonym replacement and synonym entity replacement are utilized, additional information is provided for the model, training data is enriched, and the recognition capability of the trained model on sample sentences transformed by synonym entities or sample sentences changed by synonyms is enhanced.

Fig. 3 schematically shows a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 3, the deep learning model 310 is trained by an unsupervised comparative learning training mode using unlabeled sample sentences, so as to obtain a first-stage model 320. The first-level model 320 is trained using synonym replacement sample sentence pairs, resulting in a second-level model 330. The second-level model 330 is trained using the synonymous entity replacement sample statement pair to obtain a first-level target model 340. The primary target models 340 are trained in batch using the training sample set to obtain secondary target models 350.

According to the embodiment of the disclosure, the training Sample set comprises an unlabeled Sample language sentence set and a Sample sentence pair subset, and the unlabeled Sample language sentence set and the Sample sentence pair subset are used as training data in Batch training (Sample Batch) to perform unsupervised and supervised joint training.

According to the embodiment of the disclosure, the proportion of the non-labeled sample sentence sets and the sample sentences with the same semantics to the subsets can be preset to obtain the data proportion. For example, the unlabeled sample language sentence set and the sample sentence pair subset with the same semantic are used as a training sample set of a training batch to train a primary target model, and the proportion of the unlabeled sample language sentence set and the sample sentence pair subset with the same semantic is determined according to the data proportion.

According to the embodiment of the disclosure, a two-step training mode can be adopted, and in the first-step training process, a class-style training mode is adopted to obtain a first-level target model through training. In the second step of training, batch combined training is adopted, so that the second step uses a small amount of training data on the basis of the first-level target model, and the model can reach the optimum. The deep learning model is trained in sequence by utilizing two different training modes, course type training and combined training can be combined, training precision of the model is improved, and training effects are improved under the scene that training data volume is less.

According to the embodiment of the disclosure, the sample sentence pair subset may include a synonym replacement sample sentence pair subset, may also include a synonym replacement sample sentence pair subset, and may also include a synonym replacement sample sentence pair subset and a synonym entity replacement sample sentence pair subset.

According to embodiments of the present disclosure, a subset of initial sample statement pairs may be obtained. The subset of initial sample statement pairs includes at least one of: the initial synonym replaces the subset of sample sentence pairs, and the initial synonym replaces the subset of sample sentence pairs. In a case where it is determined that the amount of data of the initial sample statement pair subset is greater than or equal to the predetermined data amount threshold, the initial sample statement pair subset is taken as the sample statement pair subset.

For example, in a case where it is determined that the data amount of the initial sample sentence pair subset is greater than or equal to the predetermined data amount threshold, the initial sample sentence pair subset may be regarded as the sample sentence pair subset. In the event that the amount of data of the subset of initial sample statement pairs is determined to be less than the predetermined data amount threshold, the subset of initial sample statement pairs may be reacquired until the amount of data of the subset of initial sample statement pairs is greater than or equal to the predetermined data amount threshold.

According to an embodiment of the present disclosure, the initial sample sentence pair subset includes an initial synonym replacement sample sentence pair subset and an initial synonymous entity replacement sample sentence pair subset, and in a case that it is determined that a data amount of the initial synonym replacement sample sentence pair subset is greater than or equal to a predetermined data amount threshold and a data amount of the initial synonymous entity replacement sample sentence pair subset is greater than or equal to a predetermined data amount threshold, the initial synonym replacement sample sentence pair subset and the initial synonymous entity replacement sample sentence pair subset may both be regarded as the sample sentence pair subset. The data volume of the initial synonym replacement sample statement pair subset and the data volume of the initial synonym replacement sample statement pair subset can be compared, and the sample statement pair subset with the high data volume is used.

According to the embodiment of the disclosure, in the process of performing the joint training, the data volume of the subset of the sample sentence pair can be controlled through the preset data volume threshold, the data volume of the supervised training data is increased, and the training precision is further improved.

Fig. 4 schematically shows a flowchart of a training method of a deep learning model according to another embodiment of the present disclosure.

As shown in fig. 4, operations S410 to S460 are included.

In operation S410, a deep learning model is trained by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-level model.

In operation S420, a first-level model is trained using the sample sentence pairs to obtain a first-level target model.

In operation S430, the primary target model is evaluated using the first evaluation statement and the second evaluation statement, and an evaluation result is obtained.

In operation S440, it is determined whether the evaluation result satisfies a predetermined training condition. In case it is determined that the evaluation result satisfies the predetermined training condition, operation S450 is performed. In case it is determined that the evaluation result does not satisfy the predetermined training condition, operation S460 is performed.

In operation S450, the primary target models are trained in batch by using the training sample set to obtain secondary target models.

In operation S460, training is stopped.

According to an embodiment of the present disclosure, the evaluation result satisfying the predetermined training condition may refer to: the evaluation result is less than a predetermined evaluation threshold. For example, the evaluation result may be used to characterize the accuracy or precision of the model, and the predetermined evaluation threshold may be used to characterize a predetermined accuracy threshold or a predetermined precision threshold. The evaluation result satisfying the predetermined training condition may mean that the accuracy of the model is less than a predetermined accuracy threshold, or the accuracy of the model is less than a predetermined accuracy threshold. The evaluation result not satisfying the predetermined training condition may mean that the evaluation result is greater than or equal to a predetermined evaluation threshold value. For example, the accuracy of the model is greater than or equal to a predetermined accuracy threshold, or the accuracy of the model is greater than or equal to a predetermined accuracy threshold.

According to the embodiment of the disclosure, the primary target model can be trained in batch by using the training sample set under the condition that the evaluation result of the model is less than the preset evaluation threshold value, so as to obtain the secondary target model. And evaluating the model after the hierarchical training by using the evaluation result, determining the convergence condition of the model in time, and improving the training efficiency of the model training.

According to an embodiment of the present disclosure, the first evaluation sentence may be a sample sentence in the training data, and the second evaluation sentence may be obtained by performing synonymous sentence augmentation processing on the first evaluation sentence.

According to an embodiment of the present disclosure, the synonym augmentation process includes at least one of: chinese and English translation and generation of a similar sentence generation model.

According to an embodiment of the present disclosure, chinese and english translation may refer to: the first evaluation sentence in Chinese is translated into an English sentence, and the English sentence is translated into a Chinese sentence as a second evaluation sentence. And constructing the second evaluation statement by using a Chinese and English translation mode, so that the semantic similarity between the first evaluation statement and the second evaluation statement is high, and the semantic information is reserved.

According to an embodiment of the present disclosure, the similar sentence generation model generation may refer to: and inputting the first evaluation statement into the similar statement generation model to obtain a second evaluation statement. The similarity sentence generation model may include, but is not limited to, a SimBERT model, and may be a derivative model of the SimBERT model, such as a SimBERTv2 model. Any deep learning model may be used as long as it can generate a second evaluation sentence semantically similar to the first evaluation sentence based on the first evaluation sentence.

According to the embodiment of the disclosure, the evaluation data is constructed in a mode of Chinese and English translation and generation of a similar sentence generation model, so that the operation of manual labeling is avoided.

According to an embodiment of the present disclosure, a plurality of initial second evaluation sentences may be generated by inputting the first evaluation sentence into the similar sentence generation model. An initial second evaluation statement may be determined from the plurality of initial second evaluation statements as a second evaluation statement. For example, the number relating to the date of the token in the initial second evaluation sentence may be compared with the number in the first evaluation sentence, the initial second evaluation sentence in which the number is changed from the first evaluation sentence may be deleted, and the initial second evaluation sentence in which the number is the same as the number in the first evaluation sentence may be retained as the second evaluation sentence. This avoids the first evaluation sentence including "12 days in month 5" and the initial second evaluation sentence including "13 days in month 5" as two sentences having similar semantics. The second evaluation statement generated by the similar sentence generation model is combined with the first evaluation statement to serve as evaluation data of the training condition of the evaluation model, so that the diversity of the evaluation data can be ensured, and the precision of model evaluation can be improved.

Fig. 5 schematically shows a flow chart of an interaction method according to an embodiment of the present disclosure.

As shown in fig. 5, the method includes operations S510 to S540.

In operation S510, a question from a user is received.

In operation S520, the question is input into the feature extraction model, and a semantic vector is obtained.

In operation S530, a target index vector matching the semantic vector is determined from the plurality of semantic index vectors.

In operation S540, an answer matching the question is determined based on the target index vector.

According to the embodiment of the disclosure, the feature extraction model is obtained by training by using a training method of a deep learning model.

According to the embodiment of the disclosure, the interaction method can be a search-type question-answer interaction or a reading-understanding-type question-answer interaction. The search-type question-answer interaction can be applied to solve answers to questions common to a specific field. The frequently asked high-frequency questions can be summarized, and question-answer pairs can be established for the high-frequency questions. By understanding the question of the user, a question-answer pair similar to or equivalent to the question of the user is found from the question-answer pair, and the answer corresponding to the question of the user is returned to the user. The reading and understanding type question-answer interaction can be applied to unstructured texts and is suitable for finding answers from the unstructured texts. Compared with the reading-understanding question-answer interaction, the retrieval-type question-answer interaction is more favorable for the application scene with fixed answers, such as the question-answer interaction of policy interpretation.

According to the embodiment of the disclosure, for the search-type question-answer interaction, the search method may include the following three ways. The first is a matching method based on keywords, the second is a matching method based on word vector representation, and the third is a matching method based on semantic vector representation. The third matching method based on semantic vector representation can extract semantic features from problems by using a feature extraction model to obtain semantic vectors at statement level. Compared with the first mode and the second mode, the third mode can utilize semantic vectors at statement level, the process is simple, and an end-to-end interaction mode can be realized.

Fig. 6 schematically shows a flow diagram of an interaction method according to an embodiment of the present disclosure.

As shown in FIG. 6, an index engine 610, such as a Milvus engine, can be constructed. The pre-accumulated high frequency questions 620 are input into the feature extraction model 630 to obtain a plurality of semantic index vectors, which may be 256-dimensional. Multiple semantic index vectors may be saved to a problem index vector library loaded with an indexing engine.

It should be noted that the dimensions of the semantic index vector and the semantic vector are not limited to 256 dimensions, and may be 64 dimensions, 128 dimensions, or 768 dimensions. The semantic index vector and the dimensionality of the semantic vector can be determined according to the precision requirement, and the higher the precision requirement is, the higher the dimensionality is.

As shown in fig. 6, a user 640 performs online question answering with a question answering robot in a voice interaction manner, the question answering robot may forward a question of the user to a server, and the server may extract semantic features of the question by using a feature extraction model 630 to obtain a semantic vector with 256 dimensions, for example. A target index vector of Top K (e.g., K may be 10) may be obtained by querying the index engine 610 from a plurality of semantic index vectors based on the semantic vectors. And obtaining the identification of the problem of the target index vector through the mapping relation. Corresponding answers 650 are found according to the identification of the questions, the server returns the answers 650 to the question-answering robot, and the question-answering robot feeds the answers 650 back to the user 640 in a voice mode.

According to an embodiment of the present disclosure, determining a target index vector matching a semantic vector from a plurality of semantic index vectors may include: and matching the semantic vector with the semantic index vectors one by one, and determining semantic similarity between each of the semantic index vectors and the semantic vector to obtain a plurality of similarity results. And sequencing the multiple similarity results from high to low, and taking the top K bits as target index vectors.

According to embodiments of the present disclosure, a plurality of semantic index vectors may correspond one-to-one to a plurality of high frequency questions. Multiple high-frequency questions can be input into the feature extraction model to obtain multiple semantic index vectors.

Fig. 7 schematically shows a block diagram of a training apparatus for a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 for deep learning model includes: a first training module 710, a second training module 720.

The first training module 710 is configured to train the deep learning model in an unsupervised comparative learning training manner by using unlabeled sample sentences to obtain a first-level model.

And a second training module 720, configured to train the first-stage model with the sample sentence pairs to obtain a first-stage target model, where the sample sentence pairs include two sample sentences with the same semantics.

According to an embodiment of the present disclosure, the sample sentence pairs include a first-level sample sentence pair and a second-level sample sentence pair, and the determination difficulty level of the semantic similarity between the second-level sample sentence pairs is higher than the determination difficulty level of the semantic similarity between the first-level sample sentence pairs.

According to an embodiment of the disclosure, the second training module comprises: the training device comprises a first training unit and a second training unit.

And the first training unit is used for training the first-stage model by using the first-stage sample sentence pair to obtain a second-stage model.

And the second training unit is used for training the second-stage model by using the second-stage sample sentence pair to obtain the first-stage target model.

According to an embodiment of the present disclosure, the sample sentence pair includes a synonym replacement sample sentence pair and a synonymous entity replacement sample sentence pair.

According to an embodiment of the disclosure, the second training module comprises: a third training unit and a fourth training unit.

And the third training unit is used for training the first-stage model by using the synonym to replace the sample sentence pair to obtain a second-stage model.

And the fourth training unit is used for training the second-level model by using the synonymous entity to replace the sample sentence pair to obtain the first-level target model.

According to the embodiment of the present disclosure, the training apparatus for deep learning model further includes: and a third training module.

And the third training module is used for training the primary target model in batches by utilizing the training sample set to obtain a secondary target model, wherein the training sample set comprises a label-free sample sentence set and a sample sentence pair subset, and the sample sentence pairs in the sample sentence pair subset comprise two sample sentences with the same semantics.

According to an embodiment of the disclosure, the third training module comprises: the device comprises an acquisition unit and a data quantity determination unit.

An obtaining unit configured to obtain a subset of initial sample statement pairs, wherein the subset of initial sample statement pairs includes at least one of: the initial synonym replaces the subset of sample sentence pairs, and the initial synonym replaces the subset of sample sentence pairs.

A data amount determination unit configured to take the initial sample statement pair subset as the sample statement pair subset in a case where it is determined that the data amount of the initial sample statement pair subset is greater than or equal to a predetermined data amount threshold.

According to the embodiment of the present disclosure, the training apparatus for deep learning model further includes: the device comprises an augmentation module and a determination module.

The augmentation module is used for carrying out synonym augmentation processing on the first evaluation statement to obtain a second evaluation statement, wherein the synonym augmentation processing comprises at least one of the following items: chinese and English translation and generation of a similar sentence generation model; and

and the determining module is used for evaluating the primary target model by utilizing the first evaluation statement and the second evaluation statement to obtain an evaluation result.

According to an embodiment of the disclosure, the third training module comprises: and a fifth training unit.

And the fifth training unit is used for training the primary target model in batch by using the training sample set under the condition that the evaluation result meets the preset training condition to obtain a secondary target model.

According to an embodiment of the present disclosure, the first training module includes: the device comprises an inactivation unit, an input unit and a sixth training unit.

And the inactivation module is used for carrying out node random inactivation treatment on the hidden layer in the deep learning model to obtain the deep learning model with the random inactivation function.

And the input module is used for inputting the non-labeled sample sentences into the deep learning model with the random inactivation function twice to obtain the positive sample feature vector pairs.

And the sixth training unit is used for training the deep learning model by using the positive sample feature vector pair to obtain a first-stage model.

According to an embodiment of the present disclosure, the first training module further comprises: length determining unit and adjusting unit.

And the length determining unit is used for determining the sentence length of the initial unmarked sample sentence.

And the adjusting unit is used for adjusting the initial non-annotated sample sentence to obtain the non-annotated sample sentence under the condition that the sentence length is determined to meet the preset processing condition.

Fig. 8 schematically shows a block diagram of an interaction device according to an embodiment of the present disclosure.

As shown in fig. 8, the interaction apparatus 800 includes: a receiving module 810, an extracting module 820, a matching module 830, and an answer determining module 840.

A receiving module 810 for receiving a question from a user.

And an extraction module 820, configured to input the question into the feature extraction model to obtain a semantic vector.

And a matching module 830, configured to determine a target index vector matched with the semantic vector from the plurality of semantic index vectors.

An answer determining module 840, configured to determine an answer matching the question based on the target index vector.

According to the embodiment of the disclosure, the feature extraction model is obtained by training by using the training device of the deep learning model provided by the embodiment of the disclosure.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as in an embodiment of the disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a training method of a deep learning model, or an interactive method. For example, in some embodiments, the training method, or the interaction method, of the deep learning model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM902 and/or communications unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the training method of the deep learning model or the interaction method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform a training method of the deep learning model, or an interactive method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a deep learning model comprises the following steps:

training a deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and

training the first-stage model by using sample statement pairs to obtain a first-stage target model, wherein the sample statement pairs comprise first-stage sample statement pairs and second-stage sample statement pairs, and the determination difficulty level of the semantic similarity between the second-stage sample statement pairs is higher than that of the first-stage sample statement pairs;

wherein the training of the first-level model by using the sample sentence pairs to obtain a first-level target model comprises:

training the first-stage model by using the first-stage sample statement pair to obtain a second-stage model; and

and training the second-stage model by using the second-stage sample statement pair to obtain the first-stage target model.

2. The method of claim 1, wherein the sample sentence pair comprises a synonym replacement sample sentence pair and a synonym entity replacement sample sentence pair;

the training of the first-stage model by using the sample sentence pairs to obtain a first-stage target model comprises the following steps:

training the first-stage model by using the synonym replacement sample sentence pair to obtain a second-stage model; and

and training the second-stage model by using the synonymous entity replacement sample statement pair to obtain the first-stage target model.

3. The method of claim 1, further comprising:

and training the primary target model in batch by using a training sample set to obtain a secondary target model, wherein the training sample set comprises an unlabeled sample sentence set and a sample sentence pair subset, and the sample sentence pairs in the sample sentence pair subset comprise two sample sentences with the same semantics.

4. The method of claim 3, wherein the batch training of the primary object model using a set of training samples to obtain a secondary object model comprises:

obtaining a subset of initial sample sentence pairs, wherein the subset of initial sample sentence pairs comprises at least one of: replacing the sample sentence pair subset by the initial synonym and replacing the sample sentence pair subset by the initial synonym entity; and

in an instance in which it is determined that the amount of data of the initial subset of sample statement pairs is greater than or equal to a predetermined data amount threshold, treating the initial subset of sample statement pairs as the subset of sample statement pairs.

5. The method of claim 4, further comprising:

carrying out synonym augmentation processing on the first evaluation sentence to obtain a second evaluation sentence, wherein the synonym augmentation processing comprises at least one of the following items: chinese and English translation and generation of a similar sentence generation model; and

and evaluating the primary target model by using the first evaluation statement and the second evaluation statement to obtain an evaluation result.

6. The method of claim 5, wherein batch training the primary target model using a training sample set to obtain a secondary target model comprises:

and under the condition that the evaluation result meets the preset training condition, training the primary target model in batch by using the training sample set to obtain a secondary target model.

7. The method according to any one of claims 1 to 6, wherein the training of the deep learning model by an unsupervised comparative learning training mode using unlabeled sample sentences to obtain a first-stage model comprises:

carrying out node random inactivation treatment on a hidden layer in the deep learning model to obtain a deep learning model with a random inactivation function;

inputting the non-labeled sample sentence into the deep learning model with the random inactivation function twice to obtain a positive sample feature vector pair; and

and training the deep learning model by using the positive sample feature vector pair to obtain the first-stage model.

8. The method of claim 1, wherein the training of the deep learning model by an unsupervised contrast learning training mode using unlabeled sample sentences to obtain a first-level model further comprises:

determining the sentence length of an initial label-free sample sentence; and

and under the condition that the sentence length is determined to meet the preset processing condition, adjusting the initial non-labeled sample sentence to obtain the non-labeled sample sentence.

9. An interaction method, comprising:

receiving a question from a user;

inputting the problem into a feature extraction model to obtain a semantic vector;

determining a target index vector matched with the semantic vector from a plurality of semantic index vectors;

determining an answer matching the question based on the target index vector,

wherein the feature extraction model is trained using the training method according to any one of claims 1 to 8.

10. A training apparatus for deep learning models, comprising:

the first training module is used for training the deep learning model by using an unmarked sample sentence in an unsupervised comparative learning training mode to obtain a first-stage model; and

the second training module is used for training the first-stage model by using sample sentence pairs to obtain a first-stage target model, wherein the sample sentence pairs comprise first-stage sample sentence pairs and second-stage sample sentence pairs, and the determination difficulty level of the semantic similarity between the second-stage sample sentence pairs is higher than the determination difficulty level of the semantic similarity between the first-stage sample sentence pairs;

the second training module comprises:

the first training unit is used for training the first-stage model by using the first-stage sample sentence pair to obtain a second-stage model; and

and the second training unit is used for training the second-level model by using the second-level sample sentence pair to obtain the first-level target model.

11. The apparatus of claim 10, wherein the sample sentence pairs comprise a synonym replacement sample sentence pair and a synonymous entity replacement sample sentence pair;

the second training module comprises:

the third training unit is used for training the first-stage model by using the synonym substitution sample sentence pair to obtain a second-stage model; and

and the fourth training unit is used for training the second-stage model by using the synonymous entity replacement sample sentence pair to obtain the first-stage target model.

12. The apparatus of claim 10, further comprising:

and the third training module is used for training the primary target model in batch by utilizing a training sample set to obtain a secondary target model, wherein the training sample set comprises an unlabeled sample language sentence set and a sample sentence pair subset, and the sample sentence pairs in the sample sentence pair subset comprise two sample sentences with the same semantics.

13. The apparatus of claim 12, wherein the third training module comprises:

an obtaining unit configured to obtain a subset of initial sample sentence pairs, wherein the subset of initial sample sentence pairs comprises at least one of: replacing the sample sentence pair subset by the initial synonym and replacing the sample sentence pair subset by the initial synonym entity; and

a data amount determination unit configured to, in a case where it is determined that the amount of data of the initial sample statement pair subset is greater than or equal to a predetermined data amount threshold, take the initial sample statement pair subset as the sample statement pair subset.

14. The apparatus of claim 13, further comprising:

the augmentation module is used for carrying out synonym augmentation processing on the first evaluation statement to obtain a second evaluation statement, wherein the synonym augmentation processing comprises at least one of the following: chinese and English translation and generation of a similar sentence generation model; and

15. The apparatus of claim 14, the third training module comprising:

and the fifth training unit is used for training the primary target model in batch by using the training sample set to obtain a secondary target model under the condition that the evaluation result meets the preset training condition.

16. The apparatus of any of claims 10-15, wherein the first training module comprises:

the inactivation unit is used for carrying out node random inactivation treatment on the hidden layer in the deep learning model to obtain the deep learning model with the random inactivation function;

the input unit is used for inputting the unlabeled sample sentence into the deep learning model with the random inactivation function twice to obtain a positive sample feature vector pair; and

and the sixth training unit is used for training the deep learning model by using the positive sample feature vector pair to obtain the first-stage model.

17. The apparatus of claim 10, wherein the first training module further comprises:

the length determining unit is used for determining the sentence length of the initial label-free sample sentence; and

and the adjusting unit is used for adjusting the initial label-free sample statement to obtain the label-free sample statement under the condition that the statement length is determined to meet the preset processing condition.

18. An interaction device, comprising:

a receiving module for receiving a question from a user;

the extraction module is used for inputting the problem into a feature extraction model to obtain a semantic vector;

the matching module is used for determining a target index vector matched with the semantic vector from a plurality of semantic index vectors;

an answer determination module for determining an answer matching the question based on the target index vector,

wherein the feature extraction model is trained using the training apparatus according to any one of claims 10 to 17.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 9.