CN111241851A

CN111241851A - Semantic similarity determination method and device and processing equipment

Info

Publication number: CN111241851A
Application number: CN202010329730.1A
Authority: CN
Inventors: 成幸毅; 徐威迪; 陈昆龙; 黄伟鹏; 蒋亮; 温祖杰; 王太峰; 褚崴
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-06-05

Abstract

The specification provides a semantic similarity determining method, a semantic similarity determining device and a semantic similarity determining processing device, wherein a pre-established semantic similarity model processes sentence pairs from two different angles, the dimensionality of text sentences is considered, and the angle of a text word level cross matrix is combined. When the semantic similarity calculation is needed to be performed on the text to be processed, semantic coding can be respectively performed on the text to be processed by directly utilizing the sentence semantic determination sub-model in the established semantic similarity model, the text to be processed is respectively converted into vector representations, and the similarity calculation is performed on the text to be processed based on the converted vectors. On the basis of ensuring the calculation efficiency of the semantic similarity, the accuracy of the semantic similarity calculation is improved.

Description

Semantic similarity determination method and device and processing equipment

Technical Field

The present specification belongs to the field of computer technologies, and in particular, to a semantic similarity determining method, apparatus, and processing device.

Background

With the development of computer technology, many businesses need to use computer technology for natural language processing, such as: search engines, smart services, and the like. When natural language processing is performed, semantic similarity is usually calculated, and an intelligent learning model is generally used to calculate the semantic similarity. How to accurately and rapidly calculate semantic similarity is a technical problem which needs to be solved urgently in the field.

Disclosure of Invention

The embodiment of the specification aims to provide a semantic similarity determining method, a semantic similarity determining device and a semantic similarity determining processing device, so that the accuracy and the efficiency of semantic similarity determination are improved.

In one aspect, an embodiment of the present specification provides a semantic similarity determining method, where the method includes:

acquiring a text to be processed;

respectively carrying out vector conversion on the text to be processed by utilizing a sentence semantic determination sub-model in the established semantic similarity model, and carrying out similarity calculation according to the converted vector by utilizing the sentence semantic determination sub-model to obtain an output result of the semantic similarity model; the semantic similarity model comprises a statement semantic determination submodel and a cross semantic determination submodel, wherein the statement semantic determination submodel is constructed by taking a result output by the pre-trained cross semantic determination submodel as a training target;

and determining the semantic similarity between the texts to be processed according to the output result of the semantic similarity model.

In another aspect, the present specification provides a model training construction method for semantic similarity calculation, including:

obtaining model parameters of a sentence semantic determination submodel and a cross semantic determination submodel in a semantic similarity model to be trained;

acquiring a first sample data set, wherein the first sample data set comprises a plurality of sample data with labels, and the sample data is text data;

pre-training the cross semantic determination submodel by using the sample data in the first sample data set and the corresponding label, and adjusting the model parameters of the cross semantic determination submodel;

and inputting the sample data in the second sample data set into the pre-trained cross semantic determination submodel and the sentence semantic determination submodel, taking an output result of the pre-trained cross semantic determination submodel as a training target for training the sentence semantic determination submodel, adjusting the model parameters of the sentence semantic determination submodel until the training requirements are met, and constructing the semantic similarity model. In yet another aspect, the present specification provides a semantic similarity determination apparatus, including:

the text acquisition module is used for acquiring a text to be processed;

the model prediction module is used for respectively carrying out vector conversion on the texts to be processed by utilizing a sentence semantic determination sub-model in the established semantic similarity model, and carrying out similarity calculation according to the converted vectors by utilizing the sentence semantic determination sub-model to obtain an output result of the semantic similarity model; the semantic similarity model comprises a statement semantic determination submodel and a cross semantic determination submodel, wherein the statement semantic determination submodel is constructed by taking a result output by the pre-trained cross semantic determination submodel as a training target;

and the similarity determining module is used for determining the semantic similarity between the texts to be processed according to the output result of the semantic similarity model.

In yet another aspect, an embodiment of the present specification provides a semantic similarity determination processing apparatus, including at least one processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the semantic similarity determination method.

According to the semantic similarity determining method, device and processing equipment provided by the specification, the pre-established semantic similarity model processes sentence pairs from two different angles, the dimensionality of text sentences is considered, and the angle of a text word level cross matrix is combined. When the semantic similarity calculation is needed to be performed on the text to be processed, semantic coding can be respectively performed on the text to be processed by directly utilizing the sentence semantic determination sub-model in the established semantic similarity model, the text to be processed is respectively converted into vector representations, and the similarity calculation is performed on the text to be processed based on the converted vectors. On the basis of ensuring the calculation efficiency of the semantic similarity, the accuracy of the semantic similarity calculation is improved. The semantic similarity model in the embodiment of the description can be used for quickly and accurately determining the semantic similarity between two texts.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram of a data interaction flow for intelligent customer service in one example of the present specification;

FIG. 2 is a flowchart illustrating an embodiment of a semantic similarity determination method provided in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a semantic similarity model in some embodiments of the present description;

FIG. 4 is a schematic diagram of a semantic similarity model construction process in some embodiments of the present description;

FIG. 5 is a block diagram illustrating an exemplary semantic similarity determination apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a hardware configuration of a semantic similarity determination server in one embodiment of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

With the development of computer internet technology, people's daily life and work can not leave computers, but natural language and computer language are different, and the natural language needs to be identified and analyzed by the computer. When a computer technology is used to analyze and process natural language, matching calculation of semantic similarity is usually required, and for calculation of inter-text similarity, a machine learning model can be generally used. For example: bert (bidirectional encoding from transforms) determines the similarity between two sentences by cross-coding the two sentences. However, when the candidate sentence pair is not given in advance, the combined computation complexity is n ^2 for n sentences, the throughput is very large, which may cause difficulty in landing on the information retrieval task.

Fig. 1 is a schematic diagram illustrating a data interaction flow of intelligent customer service in an example of the present specification, as shown in fig. 1, in some embodiments of the present specification, a user may send a request to a server through a user terminal, such as: the intelligent communication platform initiates a question, the server can forward the received question to the intelligent customer service end, and the intelligent customer service end can be understood as a question and answer robot. The question-answering robot can match the questions input by the user with the questions in the knowledge base, the semantic similarity between the questions in the question domain knowledge base input by the user needs to be calculated in the matching process, the questions asked by the user and the corresponding answers are determined according to the similarity, and the corresponding answers are returned for the user. The embodiment of the specification provides a semantic similarity determining method, which can be used for establishing a semantic similarity model in advance, wherein the semantic similarity model is obtained by training and constructing the similarity of text sentences and the similarity of text word levels. And calculating the semantic similarity of the text to be processed based on the constructed semantic similarity.

Fig. 2 is a schematic flowchart of an embodiment of a semantic similarity determining method provided in an embodiment of the present specification. Although the present specification provides the method steps or apparatus structures as shown in the following examples or figures, more or less steps or modules may be included in the method or apparatus structures based on conventional or non-inventive efforts. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiments or the drawings of the present specification. When the described method or module structure is applied to a device, a server or an end product in practice, the method or module structure according to the embodiment or the figures may be executed sequentially or in parallel (for example, in a parallel processor or multi-thread processing environment, or even in an implementation environment including distributed processing and server clustering).

In a specific embodiment of the semantic similarity determining method provided in this specification, as shown in fig. 2, the method may be used in a client (e.g., a smart phone, a tablet computer, a vehicle-mounted device, an intelligent wearable device, etc.), a server, and other terminals, and the method may include the following steps:

step 202, obtaining a text to be processed.

In a specific implementation process, the text to be processed may refer to a natural language text for which semantic similarity determination is required, and the form of the text to be processed may be different according to actual needs, for example: the text, the picture, the video, the voice and the like can be performed, and the embodiment of the specification is not particularly limited. The number of texts to be processed can be two or more, which is determined according to the actual situation.

For example: if the semantic similarity between two sentences needs to be calculated, the text to be processed can be the two sentences, and if a matched text with a certain specific text needs to be found from a data set, the text to be processed can be the characteristic text and the text in the data set.

204, respectively carrying out vector conversion on the texts to be processed by utilizing a sentence semantic determination sub-model in the established semantic similarity model, and carrying out similarity calculation according to the converted vectors by utilizing the sentence semantic determination sub-model to obtain an output result of the semantic similarity model; the semantic similarity model comprises a statement semantic determination submodel and a cross semantic determination submodel, wherein the statement semantic determination submodel is constructed by taking a result output by the pre-trained cross semantic determination submodel as a training target.

In a specific implementation process, a semantic similarity model can be trained in advance and constructed, and the semantic similarity model can be used for calculating the semantic similarity between two texts. The semantic similarity model in the embodiments of the present specification may include two parts: the sentence semantic determining sub-model and the cross semantic determining sub-model, wherein the sentence semantic determining sub-model can determine the similarity between the two texts according to the splicing of the sentence codes of the two texts, and the cross semantic determining sub-model can determine the similarity between the two texts according to the word level interaction matrix of the two texts. For example: the sentence semantic determination submodel may adopt a Siamese BERT model, i.e. a double-tower model, and the Siamese BERT model may encode two sentences into semantic representations respectively, and then perform cosine similarity measurement, such as: the two sentences can be converted into sequential vectors, then the average value of the sequential vectors of the two sentences is calculated respectively, and the similarity between the two sentences is calculated by adopting a proper merging strategy. The Cross semantic determination submodel can be understood as Cross sequence Interaction, can perform Cross matching on words of two sentences, and determines the similarity between the two sentences according to a Cross-word Interaction matrix of the two sentences. The selection of the specific model structure and algorithm of the sentence semantic determination submodel and the cross semantic determination submodel may be determined according to actual needs, and the embodiments of the present specification are not particularly limited.

In addition, the sentence semantic determination submodel in the semantic similarity model in the embodiment of the present specification is constructed by performing model training using a result output by the pre-trained cross semantic determination submodel as a training target, that is, when performing the semantic similarity model, the cross semantic determination submodel may be pre-trained, and then the model output by the pre-trained cross semantic determination submodel (for example, a probability value of semantic similarity between two sentences output by the cross semantic determination submodel) is used as the training target of the sentence semantic determination submodel to perform model training on the sentence semantic determination submodel, so as to construct the semantic similarity model. It can be seen that the semantic similarity model in the embodiment of the present specification may be understood as including two layers of models, the upper layer is a cross semantic determination sub-model, and the output of the cross semantic similarity determination model may be used as a model training target of the lower layer sentence semantic determination sub-model. The sentence semantic determination submodel calculates semantic similarity based on sentence coding between two texts, the calculation efficiency is high, the cross semantic determination submodel processes two text word levels, the semantic similarity is calculated, and the semantic similarity calculation accuracy and the calculation effect are good.

After the semantic similarity model is built, when the semantic similarity of a new text needs to be calculated, a sentence semantic determination sub-model, namely a Siamese BERT model, in the semantic similarity model can be used for respectively coding the text to be processed into semantic representations, namely respectively converting the text to be processed into corresponding vector representations, and then cosine similarity measurement is carried out to obtain an output result of the semantic similarity model. The texts to be processed can be respectively input into the semantic similarity model, splicing processing is not needed to be carried out on the texts to be processed, the sentence semantic determining sub-model in the semantic similarity model can directly carry out semantic coding on the texts to be processed respectively, the texts to be processed are converted into vector representation, and the similarity between the two texts is calculated through the similarity distance. In the embodiment of the description, when the semantic similarity model is trained and constructed, the sentence semantic determination submodel is trained and constructed on the basis of the cross semantic determination submodel, and the cross semantic determination submodel can be understood as a teacher of the sentence semantic determination submodel, and the sentence semantic determination submodel has learned the capability of the cross semantic determination submodel. When the method is used in practice, the semantic similarity between the texts to be processed can be determined by directly utilizing the calculation result of the sentence semantic determination submodel in the semantic determination model, so that the calculation result of the semantic similarity not only ensures the calculation efficiency of the sentence semantic determination submodel, but also has the accuracy of the semantic similarity calculation of the cross semantic determination submodel.

And step 206, determining the semantic similarity between the texts to be processed according to the output result of the semantic similarity model.

In a specific implementation process, after the output result of the semantic similarity model is obtained, the semantic similarity between the texts to be processed can be determined according to the output result of the semantic similarity model. Such as: if the text to be processed is two texts, the semantic similarity between the two texts can be considered to be more similar if the numerical value is larger according to the numerical value of the output result of the semantic similarity model. If there are a plurality of texts to be processed, in order to select the matching text of the specific text, the matching text with the largest numerical value can be selected as the specific text according to the output result of the semantic similarity model.

In the semantic similarity determining method provided by the embodiment of the description, the pre-established semantic similarity model processes sentence pairs from two different angles, and the dimensionality of the text sentence splicing similarity calculation is considered, and the angle of the text word level cross matrix similarity calculation is combined. When the semantic similarity of the text to be processed is required to be calculated, semantic coding can be respectively carried out on the text to be processed by utilizing the sentence semantic determination sub-model in the established semantic similarity model, the text to be processed is respectively converted into vector representations, and the similarity calculation is carried out on the text to be processed based on the converted vectors. On the basis of ensuring the calculation efficiency of the semantic similarity, the accuracy of the semantic similarity calculation is improved. The semantic similarity model in the embodiment of the description can be used for quickly and accurately determining the semantic similarity between two texts.

On the basis of the above embodiments, the semantic similarity model in the embodiments of the present specification is configured to be constructed according to the following method:

obtaining a first sample dataset, wherein the first sample dataset comprises a plurality of sample data with labels;

pre-training a cross semantic determination sub-model in the semantic similarity model by using the sample data in the first sample data set and the corresponding labels;

and inputting the sample data in the second sample data set into the pre-trained cross semantic determination submodel and the sentence semantic determination submodel, taking an output result of the pre-trained cross semantic determination submodel as a training target for training the sentence semantic determination submodel, training the sentence semantic determination submodel until the training requirement is met, and constructing the semantic similarity model.

In a specific implementation process, in training and constructing a semantic similarity model, some embodiments of the present specification may first obtain a first sample data set, where the first sample data set may include a plurality of sample data with tags, and the sample data may be a sentence pair (i.e., a text pair). For example: sentence pairs with known semantic similarity can be acquired from historical data determined by the semantic similarity as sample data, such as: the text 1 and the text 2 are a pair of texts having a higher semantic similarity and determined to have a similarity, the text 3 and the text 4 are a pair of texts having a low semantic similarity and determined to have no similarity, and whether there is a similarity that can be used as a label of sample data. The tags may be obtained from historical data determined by semantic similarity, or may be manually marked, which may be determined according to actual needs, and embodiments of the present specification are not specifically limited. Model training of the cross-semantic determination submodel in the semantic similarity model may be performed using sample data with tags in the first sample dataset, such as: sample data can be used as input of the cross semantic determination submodel, a label corresponding to the sample data is used as a training target of the cross semantic determination submodel, learning and training of the model are carried out, and model parameters of the cross semantic determination submodel are adjusted until training conditions are met, such as: and when the training times reach a certain number, or the accuracy of the model output reaches a certain accuracy, finishing the training of the cross semantic determination sub-model. And then, when the sample data is used as the input of the cross semantic determination submodel, the sample data can be subjected to word segmentation and other processing to obtain word level interaction matrixes of two sentences and then input into the cross semantic determination submodel, or the cross semantic determination submodel is used for carrying out word segmentation and other processing on the input sample data to obtain word level interaction matrixes of the two sentences, the similarity between the two sample data is calculated based on the word level interaction matrixes, and the model is pre-trained by combining with the labels corresponding to the sample data.

In some embodiments of the present description, the cross-semantic determination submodel may be understood as a pre-trained model that is a teacher of the sentence semantic determination submodel. The pre-trained model can be understood as a model created to solve similar problems, and a new model is not trained from scratch, and the model trained in the similar problems can be used as a starting point.

After the cross semantic determination submodel is trained, the sample data in the second sample data set can be input into the sentence semantic determination submodel in the semantic similarity model and the trained cross semantic determination submodel, the output result of the trained cross semantic determination submodel is used as a training target for training the sentence semantic determination submodel, and the sentence semantic determination submodel is trained until the training requirements are met, for example: and when the training times reach a certain number, or the precision of the model output reaches a certain precision, completing the training of the sentence semantic determination submodel, and constructing the semantic similarity model. The second sample data set may be the same sample data set as the first sample data set, or may be different sample data sets. The same data (which may be sample data in the first sample data set or sample data in the second sample data set) to be predicted may be input into the cross semantic determination submodel and the sentence semantic determination submodel, and the sentence semantic determination submodel may be model-trained by using the probability value of the semantic similarity of the sample data calculated by the cross semantic determination submodel as a training target of the sentence semantic determination submodel. The sentence semantic determination submodel takes the prediction result of the cross semantic determination submodel as a training target, and can be understood as a process of carrying out model distillation on the sentence semantic determination submodel so as to learn the capability of the cross semantic determination submodel. Model distillation can be understood as knowledge distillation (knowledgeable distillation) aiming at migrating the knowledge learned by a large model or a plurality of models ensemble to another light-weight single model for convenient deployment.

In some embodiments of the present specification, the pre-trained cross-semantic determination submodel may be used to re-label the sample data in the first sample data set, for example: the sample data in the first sample data set can be input into a pre-trained cross semantic determination submodel, the cross semantic determination submodel can calculate the probability value of the semantic similarity corresponding to each sample data, and the probability value of the semantic similarity calculated by the cross semantic determination submodel can be used as a calculation label of the sample data. And adding the calculation label of the sample data determined by the cross semantic determining submodel into the first sample data set, wherein the first sample data set added with the calculation label can be used as a second sample data set. And inputting the sample data in the second sample data set into the sentence semantic determination submodel, and performing model training on the sentence semantic determination submodel by taking the calculation label of the sample data determined by the cross semantic determination submodel as a training target. And re-marking the sample data by using the pre-trained cross semantic determination submodel to serve as a training target of the sentence semantic determination submodel, so that the sentence semantic determination submodel can learn the capability of the cross semantic determination submodel. In the embodiment of the description, model training is performed on the cross semantic determination submodel by using the sample data set, and then model training is performed on the sentence semantic determination submodel based on the prediction result of the trained cross semantic determination submodel on the sample data. Meanwhile, two angles of text sentence splicing and word crossing are considered, the sentence semantic determination submodel takes the cross semantic determination submodel as a teacher, the capability of the cross semantic determination submodel is learned, the semantic similarity determination efficiency is ensured, and the semantic similarity calculation accuracy is improved.

On the basis of the above embodiments, in some embodiments of the present specification, the method further includes:

taking the output result of the pre-trained cross semantic determination submodel as a training soft target for training the sentence semantic determination submodel;

taking the label of the sample data in the second sample data set as a training hard target for training the sentence semantic determination submodel;

and determining a training target of the sentence semantic determination submodel according to the training soft target and the training hard target.

In a specific implementation process, when a sentence semantic determination submodel in the semantic similarity model is trained, an output result of the trained cross semantic determination submodel (a probability value of the output similarity of the cross semantic determination submodel) can be used as a training soft target for training the sentence semantic determination submodel, a label of sample data in the second sample data set is used as a training hard target for training the sentence semantic determination submodel, and the training soft target and the training hard target are synthesized to determine the training target of the sentence semantic determination submodel. Such as: the training soft target and the training hard target may be weighted average calculated to determine the training target of the sentence semantic determination submodel, or a calculation formula of the training target may be determined through mathematical statistics or the like, and the training target of the sentence semantic determination submodel may be determined by comprehensively training the soft target and the training hard target according to a certain calculation formula, which is not limited in the embodiments of the present specification.

For example: and re-marking the sample data in the first sample data set by using the pre-trained cross semantic determination submodel, calculating a calculation label corresponding to the sample data, and adding the calculation label to the first sample data set to obtain a second sample data set. The method comprises the steps of inputting sample data in a second sample data set into a sentence semantic determination submodel, taking a label corresponding to the sample data in the second sample data set as a training hard target, taking a calculation label corresponding to the sample data as a training soft target, and training the sentence semantic determination submodel after a training target of the sentence semantic determination submodel is determined by comprehensively training the hard target and the training soft target.

According to the embodiment of the specification, the label of the sample data and the output result of the trained cross semantic determination submodel are integrated, the training target of the spliced semantic determination model is determined, the attribute determined by the sample data is learned, the capability of the cross semantic determination submodel is learned, and the efficiency and the accuracy of semantic similarity determination are improved.

On the basis of the foregoing embodiments, in some embodiments of the present specification, the determining a training target of the sentence semantic determination submodel according to the training soft target and the training hard target includes:

determining a training target of the sentence semantic determination submodel according to the soft target weight corresponding to the training soft target, the hard target weight corresponding to the training hard target, the training soft target and the training hard target; when the sentence semantic determination submodel training is started, the hard target weight is smaller than the soft target weight, the numerical value of the soft target weight is decreased progressively along with the model training of the sentence semantic determination submodel, and the numerical value of the hard target weight is increased progressively.

In a specific implementation process, weights, namely a soft target weight and a hard target weight, can be set for training a soft target and a training hard target respectively, and in a general case, the sum of the soft target weight and the hard target weight is 1, and the soft target weight and the hard target weight are variables and can be changed continuously along with a training process for determining a sub-model for sentence semantics. When the sentence semantic determination submodel is trained, the hard target weight is smaller than the soft target weight, the soft target weight is continuously decreased along with the training process of the sentence semantic determination submodel, the hard target weight is continuously increased along with the training process of the sentence semantic determination submodel, and the sum of the soft target weight and the hard target weight is still ensured to be 1. Such as: the hard target weight may be set to λ and the soft target weight to 1- λ, with λ gradually increasing as the sentence semantics determine the training of the submodel. In some embodiments of the present specification, the soft target weight λ may be set to 0 at the beginning of the sentence semantic determination submodel training, which may ensure that there are richer training samples at the initial stage of the sentence semantic determination submodel training to improve the speed of the model training, and λ may gradually increase from 0 to 1 along with the training process of the sentence semantic determination submodel to improve the accuracy of the sentence semantic determination submodel. The training target of the sentence semantic determination submodel may be obtained according to the training hard target and the corresponding hard target weight, the training soft target and the corresponding soft target weight, for example: the sum of the product of the training hard target and the target weight and the product of the training soft target and the soft target weight may be used as a training target for the sentence semantic determination submodel. In the embodiment of the description, the original labels of the sample data, namely the training hard targets and the cross semantic determination submodel, set weight values for the training soft targets of the prediction result set of the sample data respectively, and the weights corresponding to the training soft targets and the training hard targets are gradually changed along with the training of the sentence semantic determination submodel. Moreover, the weight of the training soft target is greater than the weight of the hard target when the sentence semantic determination submodel is trained, so that rich training samples can be ensured at the initial stage of the sentence semantic determination submodel training, the model training speed is improved, the numerical value of the weight of the hard target is gradually increased along with the continuous convergence of the sentence semantic determination submodel, and the accuracy of the sentence semantic determination submodel can be improved.

On the basis of the above embodiments, in some embodiments of the present specification, there are a plurality of cross semantic determination submodels, each of the cross semantic determination submodels has a different structure, and the sentence semantic determination submodel performs model training by using a result output by each of the trained cross semantic determination submodels as a training target.

In a specific implementation process, fig. 3 is a schematic structural diagram of a semantic similarity model in some embodiments of this specification, and as shown in fig. 3, there may be a plurality of cross semantic determination submodels in the semantic similarity model in this embodiment of this specification, and structures of the respective cross semantic determination submodels may be different, that is, a plurality of teachers may be provided for the sentence semantic determination submodels in this embodiment of this specification. The Cross semantic determination submodel, i.e., Cross session Interaction module, in fig. 3 may have 3 models, i.e.: BERT, RoBERTa, ALBERT, where the RoBERTa model is a modified version of BERT (from its name, a Robustly Optimized BERT, a simple rough brute force Optimized BERT method), RoBERTa builds on BERT's language masking strategy, modifies key hyper-parameters in BERT, including deleting the next sentence pre-training target of BERT, and trains with a larger back size and learning rate. RoBERTa also accepts an order of magnitude more training than BERT, which is longer. This enables the RoBERTa representation to be generalized to downstream tasks better than BERT. The ALBERT (A Lite BERT, a simplified BERT) model provides two methods for reducing memory, simultaneously improves the training speed, and secondly improves the pre-training task of NSP in the BERT. Of course, the number and the type of the cross semantic determination submodels may be defined and set by themselves according to actual needs, and the embodiments of the present specification are not particularly limited. As shown in fig. 3, the Sentence semantic determination submodel siense may sequentially learn the three Cross semantic determination submodels BERT, RoBERTa, and ALBERT in the Cross sequence Interaction module, that is, the result output by each trained Cross semantic determination submodel (for example, the similarity prediction value output by the Cross semantic determination submodel) may be used as a training target to perform model training on the Sentence semantic determination submodel siense, which may be understood as a model distillation, and by learning from a plurality of Cross semantic determination submodels, the capability of each Cross semantic determination submodel is integrated into the Sentence semantic determination submodel to improve the accuracy and efficiency of semantic determination.

For example: in some embodiments of the present specification, the sentence semantic determination submodel may be model-trained using a loss function of the following formula:

wherein,

representing a loss function, and theta can represent a parameter which can be learnt in the sentence semantic determination submodel; w can represent parameters of a full connection layer in the sentence semantic determination submodel; k represents the number of models in the cross semantic determination submodel, and K represents the kth cross semantic determination submodel; i may represent the ith sample data, D_lA sample data set may be represented, such as: d_l={（S₁，T₁，y₁），（S₂，T₂，y₂），…, （S_N，T_N，y_N) Wherein, S, T is sample data,

a label corresponding to the sample data can be represented; CE may represent an abbreviation for cross entropy; λ may represent a hard target weight;

a probability value for the output of the cross-semantic determination submodel may be represented, wherein,

a vector of sample data after encoding of the kth cross-semantic determining sub-model may be represented,

model that can represent cross-semantic determination submodelsA parameter;

may represent a probability value output when the sentence semantic determination submodel performs similarity calculation on the sample data, wherein,

the order vectors into which the two sentences in the sample data are respectively converted by the sentence semantic determination submodel can be respectively expressed.

The lambda in the formula can be initially set to be 0, the lambda can be gradually increased from 0 to 1 along with the training process of the sentence semantic determination submodel, namely, the hard training target of the sentence semantic determination submodel is only learned based on the cross semantic determination submodel at the initial training stage of the sentence semantic determination submodel, and the method has rich training samples so as to improve the model training speed. The lambda can be gradually increased from 0 to 1 along with the training process of the sentence semantic determination submodel, and the weight of the training hard target is continuously emphasized to improve the accuracy of the sentence semantic determination submodel.

The following specifically describes a semantic similarity determination scheme in an embodiment of this specification with reference to fig. 3:

as shown in fig. 3, the embodiment of the present specification actually provides an MV-SBERT (multi-view semantic similarity determination model BERT), which mainly includes two components, respectively corresponding to a Sentence semantic determination sub-model, namely, a parameter BERT module and a Cross semantic determination sub-model, namely, a Cross sequence Interaction module, wherein: (1) the Simese BERT module respectively encodes the two sentences into semantic representation, and then cosine similarity measurement is carried out; (2) the Cross sentence interaction module models word-level alignment between two sentences. Generally, the latter method has better performance than the former method.

As shown in fig. 3, in some examples of the present description, S = [ S ] for a given two sentences₁，...，S_m]，T =[T₁，...，T_n]Two sentences can be converted into two sequential vectors using siemese BERT, namely ZS = BERT(s), ZT = BERT (t). Capable of computing an order vector of all outputsThe average values u, v may concatenate u, v and | u-v | into a fully connected layer that transforms the projected hidden size into a probability distribution. In FIG. 3, MEAN can be understood as a simple MEAN regression strategy, [ CLS ]]This feature can be expressed for a classification model, [ SEP]Sentence symbols can be represented for breaking two sentences in the input corpus. The Cross sequence Interaction module, which can be understood as a number of additional different pre-trained models added to the siernese BERT, introduces an Interaction matrix across words to enrich the word-level Interaction functionality. Each cross-semantic determination submodel first performs pre-training of the tagged data, acting as a "teacher" to re-tag the data and add it to the new training set. Specifically, as shown in FIG. 3 (top), we make the sentence pair S and T a text sequence [ [ CLS ]]，S，[SEP]，T，[SEP]]。[CLS]Can be considered as an aggregate semantic gap of input sentence pairs, which is used to predict whether sentence pairs are coherent during pre-training, and finally takes the softmax layer as a classifier.

As shown in fig. 3, the solid line arrow between the Cross sequence Interaction module and the parameter BERT module in the figure may indicate that learning is to be done to a training hard target (i.e., the label carried by the sample data itself), while the dashed arrow may indicate the model distillation process, i.e., learning of the Cross semantic determination submodule in the Cross sequence Interaction by the parameter BERT. Wherein, the siense BERT needs to learn not only to the training hard target, but also to the cross-semantic determination submodule.

The embodiment of the specification provides a multi-View MV-SBERT, which processes sentence pairs from two different angles, namely, Siamese View (double tower View) and Interaction View (interactive View). Siamese View is used for generating a main View, Interaction View integrates BERT of cross coding as an auxiliary View, and the expression capability of sentence embedding is improved. The Simese BERT ensures efficiency in retrieval speed, and training in conjunction with the Cross Sennce Interaction module improves the effect of the Simese BERT without changing speed. Experiments show that the method is superior to the latest sentence embedding method in both supervised and unsupervised environments, and can accurately and quickly determine the semantic similarity between texts.

On the basis of the foregoing embodiments, in some embodiments of the present specification, a training construction method of a semantic similarity model may also be provided, where the semantic similarity model may be used to calculate or predict semantic similarity between two texts, fig. 4 is a schematic diagram of a construction process of the semantic similarity model in some embodiments of the present specification, and as shown in fig. 4, the following may be referred to for construction of the semantic similarity model in the embodiments of the present specification:

step 402, obtaining model parameters of a sentence semantic determination submodel and a cross semantic determination submodel in the semantic similarity model to be trained.

The model architecture and model parameters of the semantic similarity model can be designed in advance, the semantic similarity model in the embodiment of the description can comprise a statement semantic determination sub-model and a cross semantic determination sub-model, wherein the statement semantic determination sub-model can determine the similarity between two texts according to the splicing of sentence codes of the two texts, and a Simese BERT model can be adopted. The cross semantic determination submodel may determine similarity between two texts according to a word-level interaction matrix of the two texts, and the cross semantic determination submodel may have a plurality of models, such as: BERT, RoBERTA, ALBERT may be selected. The model parameters can be set or defined according to actual needs, such as: the number of full connection layers, and the like, and the examples in this specification are not particularly limited.

Step 404, obtaining a first sample data set, where the first sample data set includes a plurality of sample data with tags, and the sample data is text data.

After a model architecture or model parameters of a semantic similarity model are constructed, a plurality of sample data with tags, that is, a first sample data set, may be acquired, and sample data in the first sample data set and sample data in a second sample data set in an embodiment of the present specification may be text data, such as: for sentence pairs, specific forms can refer to the descriptions of the above embodiments.

And 406, pre-training the cross semantic determination submodel by using the sample data in the first sample data set and the corresponding label, and adjusting the model parameters of the cross semantic determination submodel.

The first sample data set may be used to pre-train the cross semantic determination submodel, and the model parameters of the cross semantic determination submodel are adjusted until the cross semantic determination submodel meets the requirements, and the specific pre-training process may refer to the records of the above embodiments, which are not described herein again.

Step 408, inputting the sample data in the second sample data set into the pre-trained cross semantic determination submodel and the sentence semantic determination submodel, taking the output result of the pre-trained cross semantic determination submodel as a training target for training the sentence semantic determination submodel, adjusting the model parameters of the sentence semantic determination submodel until the training requirements are met, and constructing the semantic similarity model.

After the cross semantic determination submodel is pre-trained, the pre-trained cross semantic determination submodel can be used for training the sentence semantic determination submodel, that is, the output result of the cross semantic determination submodel can be used as a training target for training the sentence semantic determination submodel, and the sentence semantic determination submodel is trained. For example: the pre-trained cross-semantic determining submodel can be used for re-marking the sample data in the first sample data set, and the re-marked calculation label is added into the first sample data set to obtain a second sample data set. And inputting the sample data in the second sample data set into the sentence semantic determination submodel, and performing model training on the sentence semantic determination submodel by taking the calculation label of the sample data determined by the cross semantic determination submodel as a training target. The sentence semantic determination submodel considers sentence dimensionality when performing semantic similarity calculation, the cross semantic determination submodel considers word interaction dimensionality when performing semantic similarity calculation, two angles of text sentence splicing and word crossing are considered simultaneously, the sentence semantic determination submodel takes the cross semantic determination submodel as a teacher, the capability of the cross semantic determination submodel is learned, and the trained semantic similarity model not only ensures the semantic similarity determination efficiency, but also improves the semantic similarity calculation accuracy.

On the basis of the above embodiment, the method further includes:

When a sentence semantic determination submodel in the semantic similarity model is trained, an output result of the trained cross semantic determination submodel (a probability value of the output similarity of the cross semantic determination submodel) can be used as a training soft target for training the sentence semantic determination submodel, a label of sample data in the second sample data set is used as a training hard target for training the sentence semantic determination submodel, and the training soft target and the training hard target are comprehensively trained to determine the training target of the sentence semantic determination submodel. The label of the sample data and the output result of the trained cross semantic determination submodel are integrated, the training target of the spliced semantic determination model is determined, the attribute determined by the sample data is learned, the capability of the cross semantic determination submodel is learned, and the efficiency and the accuracy of semantic similarity determination are improved.

In a specific implementation process, weights, namely a soft target weight and a hard target weight, can be set for training a soft target and a training hard target respectively, and in a general case, the sum of the soft target weight and the hard target weight is 1, and the soft target weight and the hard target weight are variables and can be changed continuously along with a training process for determining a sub-model for sentence semantics. When the sentence semantic determination submodel is trained, the hard target weight is smaller than the soft target weight, the soft target weight is continuously decreased along with the training process of the sentence semantic determination submodel, the hard target weight is continuously increased along with the training process of the sentence semantic determination submodel, and the sum of the soft target weight and the hard target weight is still ensured to be 1. Such as: the hard target weight may be set to λ and the soft target weight to 1- λ, with λ gradually increasing as the sentence semantics determine the training of the submodel. In some embodiments of the present specification, the soft target weight λ may be set to 0 at the beginning of the sentence semantic determination submodel training, which may ensure that there are richer training samples at the initial stage of the sentence semantic determination submodel training to improve the speed of the model training, and λ may gradually increase from 0 to 1 along with the training process of the sentence semantic determination submodel to improve the accuracy of the sentence semantic determination submodel. The training target of the sentence semantic determination submodel may be determined and obtained according to the training hard target and the corresponding hard target weight, the training soft target and the corresponding soft target weight, which may specifically refer to the description of the above embodiments, and will not be described herein again.

It should be noted that the construction method of the semantic similarity model may also refer to the above embodiment of the semantic similarity determination method, including other embodiments, which are not described herein again.

In the present specification, each embodiment of the method is described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The relevant points can be obtained by referring to the partial description of the method embodiment.

Based on the semantic similarity determination method, one or more embodiments of the present specification further provide a system for semantic similarity determination. The system may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc. that use the methods described in embodiments of the present specification in conjunction with any necessary hardware-implemented devices. Based on the same innovative conception, embodiments of the present specification provide an apparatus as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific apparatus implementation in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Specifically, fig. 5 is a schematic block structure diagram of an embodiment of the semantic similarity determining apparatus provided in this specification, and as shown in fig. 5, the semantic similarity determining apparatus provided in this specification may include: a text obtaining module 51, a model predicting module 52, and a similarity determining module 53, wherein:

a text obtaining module 51, configured to obtain a text to be processed;

the model prediction module 52 may be configured to perform vector conversion on the texts to be processed respectively by using a sentence semantic determination submodel in the established semantic similarity model, and perform similarity calculation according to the converted vector by using the sentence semantic determination submodel to obtain an output result of the semantic similarity model; the semantic similarity model comprises a statement semantic determination submodel and a cross semantic determination submodel, wherein the statement semantic determination submodel is constructed by taking a result output by the pre-trained cross semantic determination submodel as a training target;

the similarity determining module 53 may be configured to determine semantic similarity between the texts to be processed according to an output result of the semantic similarity model.

In the semantic similarity determining apparatus provided in the embodiment of the present specification, the pre-established semantic similarity model processes the sentence pair from two different angles, which not only considers the dimensionality of the text sentences, but also combines the angle of the text word level cross matrix. When the semantic similarity calculation is needed to be performed on the text to be processed, semantic coding can be respectively performed on the text to be processed by directly utilizing the sentence semantic determination sub-model in the established semantic similarity model, the text to be processed is respectively converted into vector representations, and the similarity calculation is performed on the text to be processed based on the converted vectors. On the basis of ensuring the calculation efficiency of the semantic similarity, the accuracy of the semantic similarity calculation is improved. When the semantic similarity model is trained and constructed, the sentence semantic determination submodel is trained and constructed on the basis of the cross semantic determination submodel, the cross semantic determination submodel can be understood as a teacher of the sentence semantic determination submodel, and the sentence semantic determination submodel has learned the capability of the cross semantic determination submodel. When the method is used in practice, the semantic similarity between the texts to be processed can be determined by directly utilizing the calculation result of the sentence semantic determination submodel in the semantic determination model, so that the calculation result of the semantic similarity not only ensures the calculation efficiency of the sentence semantic determination submodel, but also has the accuracy of cross semantic determination submodel.

On the basis of the above embodiment, the apparatus further includes a model construction module configured to construct the semantic similarity model according to the following method:

In the embodiment of the description, model training is performed on the cross semantic determination submodel by using the sample data set, and then model training is performed on the sentence semantic determination submodel based on the prediction result of the trained cross semantic determination submodel on the sample data. Meanwhile, two angles of text sentence splicing and word crossing are considered, the sentence semantic determination submodel takes the cross semantic determination submodel as a teacher, the capability of the cross semantic determination submodel is learned, the semantic similarity determination efficiency is ensured, and the semantic similarity calculation accuracy is improved.

On the basis of the foregoing embodiments, in some embodiments of the present specification, the model building module is further configured to:

On the basis of the foregoing embodiments, in some embodiments of the present specification, the model building module is specifically configured to:

In the embodiment of the description, the original labels of the sample data, namely the training hard targets and the cross semantic determination submodel, set weight values for the training soft targets of the prediction result set of the sample data respectively, and the weights corresponding to the training soft targets and the training hard targets are gradually changed along with the training of the sentence semantic determination submodel. Moreover, the weight of the training soft target is greater than the weight of the hard target when the sentence semantic determination submodel is trained, so that rich training samples can be ensured at the initial stage of the sentence semantic determination submodel training, the model training speed is improved, the numerical value of the weight of the hard target is gradually increased along with the continuous convergence of the sentence semantic determination submodel, and the accuracy of the sentence semantic determination submodel can be improved.

In the embodiment of the description, the capabilities of each cross semantic determination submodel are integrated into the sentence semantic determination submodel by learning from a plurality of cross semantic determination submodels, so that the accuracy and efficiency of semantic determination are improved.

It should be noted that, in the embodiments of the present description, the training and construction method for referring to the semantic similarity model may further include a training and construction device for the semantic similarity model, and the specific implementation manner may refer to the method embodiments. The above-described apparatus may also include other embodiments in accordance with the description of the corresponding method embodiments. The specific implementation manner may refer to the description of the above corresponding method embodiment, and is not described in detail herein.

An embodiment of the present specification further provides a semantic similarity determination processing apparatus, including: at least one processor and a memory for storing processor-executable instructions, the processor implementing the information recommendation data processing method of the above embodiment when executing the instructions, such as:

acquiring a text to be processed;

It should be noted that the above description of the processing device according to the method embodiment may also include other implementations. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

The semantic similarity determining device provided by the specification can also be applied to various data analysis and processing systems. The system or server or terminal or processing device may be a single server, or may include a server cluster, a system (including a distributed system), software (applications), a practical operating device, a logic gate device, a quantum computer, etc. using one or more of the methods described herein or one or more embodiments of the system or server or terminal or processing device, in combination with necessary end devices implementing hardware. The system for checking for discrepancies may comprise at least one processor and a memory storing computer-executable instructions that, when executed by the processor, implement the steps of the method of any one or more of the embodiments described above.

The method embodiments provided by the embodiments of the present specification can be executed in a mobile terminal, a computer terminal, a server or a similar computing device. Taking the example of the operation on a server, fig. 6 is a hardware structure block diagram of a semantic similarity determination server in an embodiment of the present specification, and the computer terminal may be the semantic similarity determination server or the semantic similarity determination apparatus in the above embodiment. As shown in fig. 6, the server 10 may include one or more (only one shown) processors 100 (the processors 100 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a non-volatile memory 200 for storing data, and a transmission module 300 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration and is not intended to limit the structure of the electronic device. For example, the server 10 may also include more or fewer components than shown in FIG. 6, and may also include other processing hardware, such as a database or multi-level cache, a GPU, or have a different configuration than shown in FIG. 6, for example.

The non-volatile memory 200 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the semantic similarity determination method in the embodiments of the present specification, and the processor 100 executes various functional applications and resource data updates by running the software programs and modules stored in the non-volatile memory 200. Non-volatile memory 200 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the non-volatile memory 200 may further include memory located remotely from the processor 100, which may be connected to a computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, office-to-network, mobile communication networks, and combinations thereof.

The transmission module 300 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission module 300 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission module 300 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The method or apparatus provided in this specification and described in the foregoing embodiments may implement service logic through a computer program and record the service logic on a storage medium, where the storage medium may be read and executed by a computer, and implement the effects of the solutions described in the embodiments of this specification, such as:

acquiring a text to be processed;

inputting the text to be processed into the established semantic similarity model to obtain an output result of the semantic similarity model; the semantic similarity model comprises a statement semantic determination submodel and a cross semantic determination submodel, wherein the statement semantic determination submodel is constructed by taking a result output by the pre-trained cross semantic determination submodel as a training target;

The storage medium may include a physical device for storing information, and typically, the information is digitized and then stored using an electrical, magnetic, or optical media. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

The semantic similarity determining method or apparatus provided in the embodiments of the present specification may be implemented in a computer by a processor executing corresponding program instructions, for example, implemented in a PC end using a c + + language of a windows operating system, implemented in a linux system, or implemented in an intelligent terminal using android, iOS system programming languages, implemented in processing logic based on a quantum computer, or the like.

It should be noted that descriptions of the apparatus, the computer storage medium, and the system described above according to the related method embodiments may also include other embodiments, and specific implementations may refer to descriptions of corresponding method embodiments, which are not described in detail herein.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to only the partial description of the method embodiment.

The embodiments of the present description are not limited to what must be consistent with industry communications standards, standard computer resource data updating and data storage rules, or what is described in one or more embodiments of the present description. Certain industry standards, or implementations modified slightly from those described using custom modes or examples, may also achieve the same, equivalent, or similar, or other, contemplated implementations of the above-described examples. The embodiments using the modified or transformed data acquisition, storage, judgment, processing and the like can still fall within the scope of the alternative embodiments of the embodiments in this specification.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardsradware (Hardware Description Language), vhjhd (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although one or more embodiments of the present description provide method operational steps as described in the embodiments or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive approaches. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When the device or the end product in practice executes, it can execute sequentially or in parallel according to the method shown in the embodiment or the figures (for example, in the environment of parallel processors or multi-thread processing, even in the environment of distributed resource data update). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable resource data updating apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable resource data updating apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable resource data update apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable resource data update apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage, graphene storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, and the relevant points can be referred to only part of the description of the method embodiments. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is merely exemplary of one or more embodiments of the present disclosure and is not intended to limit the scope of one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present specification should be included in the scope of the claims.

Claims

1. A semantic similarity determination method, the method comprising:

acquiring a text to be processed;

2. The method of claim 1, wherein the semantic similarity model is configured to be constructed as follows:

3. The method of claim 2, further comprising:

4. The method of claim 3, the determining a training target for the sentence semantic determination submodel from the training soft target and the training hard target, comprising:

5. The method of claim 4, further comprising:

and when the sentence semantic determination sub-model training is started, the hard target weight is 0.

6. The method of any of claims 3-5, further comprising:

determining a calculation label of sample data in the first sample data set by using the pre-trained cross semantic determination sub-model;

and adding the computation tag into the first sample data set, and taking the first sample data set added with the computation tag as the second sample data set.

7. The method of claim 1, wherein there are a plurality of cross semantic determination submodels, each cross semantic determination submodel has a different structure, and the sentence semantic determination submodel performs model training with the result output by each trained cross semantic determination submodel as a training target.

8. The method of claim 2, the cross semantic determination submodel determining a similarity sentence semantic determination submodel between two texts by determining from a word-level interaction matrix of the two texts.

9. A model training construction method for semantic similarity calculation, the method comprising:

and inputting the sample data in the second sample data set into the pre-trained cross semantic determination submodel and the sentence semantic determination submodel, taking an output result of the pre-trained cross semantic determination submodel as a training target for training the sentence semantic determination submodel, adjusting the model parameters of the sentence semantic determination submodel until the training requirements are met, and constructing the semantic similarity model.

10. The method of claim 9, the method further comprising:

11. The method of claim 10, the determining a training target for the sentence semantic determination submodel from the training soft target and the training hard target, comprising:

12. A semantic similarity determination apparatus comprising:

the text acquisition module is used for acquiring a text to be processed;

13. The apparatus of claim 12, further comprising a model construction module to construct the semantic similarity model according to the following method:

14. The apparatus of claim 13, the model building module further to:

15. The apparatus of claim 14, the model building module to be specifically configured to:

16. The apparatus according to claim 12, wherein there are a plurality of cross semantic determination submodels, each cross semantic determination submodel has a different structure, and the sentence semantic determination submodel performs model training with the result output by each trained cross semantic determination submodel as a training target.

17. A semantic similarity determination processing device comprising: at least one processor and a memory for storing processor-executable instructions, the processor implementing the method of any one of claims 1-8 when executing the instructions.