CN114662668A

CN114662668A - Neural network training method, semantic similarity calculation method and semantic retrieval system

Info

Publication number: CN114662668A
Application number: CN202210311749.2A
Authority: CN
Inventors: 曾祥云; 朱姬渊
Original assignee: Shanghai Yikangyuan Medical Health Technology Co ltd
Current assignee: Guangzhou Tianchen Health Technology Co ltd
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-06-24

Abstract

The invention discloses a neural network training method, a semantic similarity calculation method and a semantic retrieval system. The neural network training method comprises the following steps: s1, marking sentence similarity; s2, inputting the two marked sentences into a neural network; s3, processing the characteristics of the two sentences and calculating a loss function; and S4, training the neural network according to the loss value of the loss function. The neural network training method adopted by the technical scheme of the invention has the advantages of less neural network parameters, high retrieval speed and capability of accurately calculating the semantic similarity, thereby acquiring the semantic information of the text, improving the retrieval accuracy and being suitable for high-concurrency and low-delay demand scenes.

Description

Neural network training method, semantic similarity calculation method and semantic retrieval system

Technical Field

The invention relates to the technical field of information retrieval, in particular to a neural network training method, a semantic similarity calculation method and a semantic retrieval system.

Background

Natural Language Processing (NLP) is an important research direction in computer science and artificial intelligence. The main research on various theories and methods for realizing effective communication between people and computers by natural language is a subject integrating linguistics, computer science and mathematics.

In natural language processing, there are many scenarios that require semantic similarity matching between different texts. Therefore, semantic similarity calculation is one of the technical directions for solving such requirements, and is a basic technology for text duplication checking, intelligent question answering and other applications. Semantic similarity means that for a given two texts, the similarity between the two is measured from a semantic point of view, and usually gives a semantic similarity score between 0 and 1, with higher scores representing more similarity.

In the prior art, some static Word vectors based on Word2Vec and the like are used for calculating semantic similarity, but the static Word vectors have the main technical defects that the conditions of Word ambiguity, Word segmentation error and the like cannot be considered, so that the accuracy of a calculation result is greatly influenced.

In addition, since the BERT model comes out, the pretrained language model represented by BERT has achieved a surprising effect in many NLP tasks, and therefore the text semantic similarity based on the BERT model is also beneficially explored.

Disclosure of Invention

The invention provides a neural network training method for solving the technical problems in the prior art, which comprises the following steps:

s1, labeling the similarity of the two sentences;

s2, inputting the two marked sentences into a neural network to obtain the characteristics of each character in the sentences;

s3, averaging the characteristics of each word to obtain the characteristics of two sentences S1 and S2, and calculating a loss function;

and S4, training the neural network according to the loss value of the loss function.

Further, in step S1, the sentence similarity is labeled as: sentenceA sentenceB Score.

Further, the calculating of the loss function in step S3 includes:

calculating the similarity of complementary strings of S1 and S2, namely sim ═ cos (S1, S2);

dividing the labels of 1-5 points by 5 respectively, and normalizing to be between 0 and 1 to obtain normalized labels label;

the loss function is calculated from sim and label.

Further, the loss function is formulated as:

Loss＝|sim-label|

where sim is cos (S1, S2), which is cosine similarity.

Further, the neural network structure comprises a linear network unit, an embedding unit, a feature extraction unit and a compression unit, wherein:

the linear network unit is used for copying an input variable into three parts as input to respectively obtain the query, the key and the value of a sentence;

the embedded unit is used for coding the input word, the absolute position and the number of the belonged paragraph, and then processing the coded word and the absolute position to obtain a word vector;

the feature extraction unit is used for performing dimension-increasing processing on the word vectors, extracting features through the transformer module and outputting the features of the word vectors;

the compression unit is used for compressing the characteristics of the word vectors.

Further, the transformers are composed of a plurality of pairs of transformer sets which are sequentially connected in series, wherein each pair of transformer sets is composed of two layers of transformers, and the transformer parameters of each group are completely shared.

The invention also provides a semantic similarity calculation method, which comprises the following steps:

receiving an input sentence;

the neural network searches key sentences and related contents according to input sentences, respectively extracts features and calculates semantic similarity;

and returning the first N sentences with the highest semantic similarity scores.

The invention also provides a semantic retrieval system, which comprises an acquisition module, a processing module and an output module, wherein:

the acquisition module is used for receiving input sentences;

the processing module is used for processing the input sentences;

the output module is used for returning the processing result of the processing module.

The present invention also provides a computer-readable storage medium, in which instructions or a program are stored, and the instructions or the program are loaded and executed by a processor to implement the semantic similarity calculation method.

The present invention also provides an electronic device comprising: the semantic similarity calculation method comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the semantic similarity calculation method.

In practical applications, the modules in the method and system disclosed by the present invention may be deployed on one target server, or each module may be deployed on a different target server independently, and particularly, in order to provide a stronger computing processing capability according to needs, the modules may also be deployed on a cluster target server according to needs.

Therefore, the neural network training method adopted by the technical scheme has the advantages of less neural network parameters, high retrieval speed and capability of accurately calculating the semantic similarity, so that the semantic information of the text can be obtained, the retrieval accuracy is improved, and the method is suitable for high-concurrency and low-delay requirement scenes.

In order that the invention may be more clearly and fully understood, specific embodiments thereof are described in detail below with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic structural diagram of a neural network according to an embodiment of the present application.

Detailed Description

The application discloses a neural network training method aiming at technical defects in the prior art, and the method comprises the following steps:

s1, labeling the similarity of the two sentences;

The technical solution of the present application will be further described below with reference to various preferred embodiments.

S1, labeling the similarity of the two sentences;

the purpose of this step is to transform the input sentence into a sentence format that can be processed for training the neural network. The format of two sentences is marked as follows: sentenceA sentenceB Score, namely a sentence and a numerical value of similarity, two sentences and similarity labels are input, the labels are represented by 1, 2,3,4 and 5 points according to the similarity, and the higher the Score is, the more similar the sentences are, such as the following groups of sentences, and the similarity label value is marked according to the meaning similarity:

the weather is very good today, and much better today is No. 5

Today's weather is very good and today's weather is clear 4

Very good weather today and bad weather today 1

after the sentence is marked, the marked data set can be input into the neural network for training.

Referring to fig. 1, as a preferred embodiment of neural network training, the neural network structure constructed in the present application is a neural network structure obtained by performing an improvement based on albert, and includes a linear network unit, an embedding unit, a feature extraction unit, and a compression unit, where:

the linear network unit is used for copying a group of input variables into three parts as input, and obtaining query, key and value of the obtained sentence through a linear network; the linear network may be a linear layer or a plurality of linear layers.

In the embodiment of the present application, after extracting query, key, and value, a more preferred implementation is proposed, that is, an attention adding mechanism, including the following steps:

firstly, multiplying a matrix formed by query and key;

dividing the number of the groups for normalization;

removing the filling part of the boundary;

obtaining the attention coefficient of each word in the sentence through a softmax activation function;

the information attention coefficient of each word is subjected to drop part feature processing, namely dropout processing, so that the calculation amount of training can be reduced;

then multiplying with a value matrix;

and finally, restoring the multiple groups into one group, so that the relationship between each word can be better obtained.

The embedded unit is used for coding the input word, the absolute position and the number of the belonged paragraph, and then processing the coded word and the absolute position to obtain a word vector; in this embodiment, an absolute position code is used, vectorization processing is performed on an input word (word _ id), the absolute position code of the word, and a sentence or paragraph to which the word belongs according to the sequence, in this embodiment, 128 dimensions are adopted, the information is subjected to superposition processing, normalization processing and dropout processing, and then, an embedded matrix and a mask vector with boundaries removed are obtained.

The feature extraction unit is configured to perform dimension-up processing on the word vector, extract features through the transform module, and output features of the word vector.

The compression unit is used for compressing the features of the word vectors, and the embodiment uses an average posing scheme to process, and takes the average value of each word feature vector as the features of the whole sentence.

the compression unit obtains the features S1, S2 of two sentences by compressing the word features and using the mean value of each word feature vector as the features of the whole sentence, and then can be used to calculate the loss function, as a preferred embodiment, the method for calculating the loss function includes:

the characteristics S1 and S2 of the two sentences are subjected to cosine similarity processing, and the formula is as follows: sim is cos (S1, S2), and cos is cosine similarity.

Then dividing the labels with similarity scores of 1-5 by 5 respectively, and normalizing to 0-1 to obtain normalized labels label;

calculating a loss function by sim and label, wherein the loss function formula in the embodiment is as follows:

Loss＝|sim-label|

label is the similarity of two sentences sentenceA and sentenceB labeled manually.

And S4, training the neural network according to the loss value of the loss function, wherein the specific training or optimizing mode can be realized by adopting various existing algorithms.

Based on the training method of the neural network, the obtained neural network can be used for prediction (retrieval). As an embodiment, a semantic similarity calculation method provided by the present application includes:

receiving an input sentence;

Wherein, the used neural network is the neural network obtained by the training method.

Based on the semantic similarity calculation method, the application also provides a semantic retrieval system, which comprises an acquisition module, a processing module and an output module, wherein:

the acquisition module is used for receiving input sentences;

the processing module is used for processing the input sentences;

The embodiment of the present application further provides a computer-readable storage medium, where instructions or a program are stored in the storage medium, and the instructions or the program are loaded by a processor and execute any one of the semantic similarity calculation methods described above.

An embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate with each other through the bus, and the processor executes the machine-readable instructions to execute the semantic similarity calculation method according to any one of the above methods.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A neural network training method is characterized by comprising the following steps:

s1, labeling the similarity of the two sentences;

2. The neural network training method of claim 1, wherein in step S1, the sentence similarity is labeled as: sentenceA sentenceB Score.

3. The neural network training method of claim 1, wherein the calculating of the loss function in step S3 includes:

the loss function is calculated from sim and label.

4. The neural network training method of claim 3, wherein the loss function is formulated as:

Loss＝|sim-label|

where sim is cos (S1, S2), and cos is cosine similarity.

5. The neural network training method of claim 1, wherein the neural network structure comprises a linear network unit, an embedding unit, a feature extraction unit, and a compression unit, wherein:

the feature extraction unit is used for performing dimension-increasing processing on the word vectors, extracting features and outputting the features of the word vectors;

6. The neural network training method of claim 5, wherein the fransformers are a plurality of pairs of fransformer groups connected in series, wherein each pair of fransformer groups is composed of two layers of fransformers, and the fransformer parameters of each group are completely shared.

7. A semantic similarity calculation method is characterized by comprising the following steps:

receiving an input sentence;

8. A semantic retrieval system is characterized by comprising an acquisition module, a processing module and an output module, wherein:

the acquisition module is used for receiving input sentences;

the processing module is used for processing the input sentences;

9. A computer-readable storage medium, in which instructions or a program are stored, which are loaded and executed by a processor to implement the semantic similarity calculation method according to claim 7.

10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the semantic similarity calculation method according to claim 7.