CN114662668A - Neural network training method, semantic similarity calculation method and semantic retrieval system - Google Patents

Neural network training method, semantic similarity calculation method and semantic retrieval system Download PDF

Info

Publication number
CN114662668A
CN114662668A CN202210311749.2A CN202210311749A CN114662668A CN 114662668 A CN114662668 A CN 114662668A CN 202210311749 A CN202210311749 A CN 202210311749A CN 114662668 A CN114662668 A CN 114662668A
Authority
CN
China
Prior art keywords
neural network
sentences
semantic
word
training method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210311749.2A
Other languages
Chinese (zh)
Inventor
曾祥云
朱姬渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tianchen Health Technology Co ltd
Original Assignee
Shanghai Yikangyuan Medical Health Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yikangyuan Medical Health Technology Co ltd filed Critical Shanghai Yikangyuan Medical Health Technology Co ltd
Priority to CN202210311749.2A priority Critical patent/CN114662668A/en
Publication of CN114662668A publication Critical patent/CN114662668A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a neural network training method, a semantic similarity calculation method and a semantic retrieval system. The neural network training method comprises the following steps: s1, marking sentence similarity; s2, inputting the two marked sentences into a neural network; s3, processing the characteristics of the two sentences and calculating a loss function; and S4, training the neural network according to the loss value of the loss function. The neural network training method adopted by the technical scheme of the invention has the advantages of less neural network parameters, high retrieval speed and capability of accurately calculating the semantic similarity, thereby acquiring the semantic information of the text, improving the retrieval accuracy and being suitable for high-concurrency and low-delay demand scenes.

Description

Neural network training method, semantic similarity calculation method and semantic retrieval system
Technical Field
The invention relates to the technical field of information retrieval, in particular to a neural network training method, a semantic similarity calculation method and a semantic retrieval system.
Background
Natural Language Processing (NLP) is an important research direction in computer science and artificial intelligence. The main research on various theories and methods for realizing effective communication between people and computers by natural language is a subject integrating linguistics, computer science and mathematics.
In natural language processing, there are many scenarios that require semantic similarity matching between different texts. Therefore, semantic similarity calculation is one of the technical directions for solving such requirements, and is a basic technology for text duplication checking, intelligent question answering and other applications. Semantic similarity means that for a given two texts, the similarity between the two is measured from a semantic point of view, and usually gives a semantic similarity score between 0 and 1, with higher scores representing more similarity.
In the prior art, some static Word vectors based on Word2Vec and the like are used for calculating semantic similarity, but the static Word vectors have the main technical defects that the conditions of Word ambiguity, Word segmentation error and the like cannot be considered, so that the accuracy of a calculation result is greatly influenced.
In addition, since the BERT model comes out, the pretrained language model represented by BERT has achieved a surprising effect in many NLP tasks, and therefore the text semantic similarity based on the BERT model is also beneficially explored.
Disclosure of Invention
The invention provides a neural network training method for solving the technical problems in the prior art, which comprises the following steps:
s1, labeling the similarity of the two sentences;
s2, inputting the two marked sentences into a neural network to obtain the characteristics of each character in the sentences;
s3, averaging the characteristics of each word to obtain the characteristics of two sentences S1 and S2, and calculating a loss function;
and S4, training the neural network according to the loss value of the loss function.
Further, in step S1, the sentence similarity is labeled as: sentenceA sentenceB Score.
Further, the calculating of the loss function in step S3 includes:
calculating the similarity of complementary strings of S1 and S2, namely sim ═ cos (S1, S2);
dividing the labels of 1-5 points by 5 respectively, and normalizing to be between 0 and 1 to obtain normalized labels label;
the loss function is calculated from sim and label.
Further, the loss function is formulated as:
Loss=|sim-label|
where sim is cos (S1, S2), which is cosine similarity.
Further, the neural network structure comprises a linear network unit, an embedding unit, a feature extraction unit and a compression unit, wherein:
the linear network unit is used for copying an input variable into three parts as input to respectively obtain the query, the key and the value of a sentence;
the embedded unit is used for coding the input word, the absolute position and the number of the belonged paragraph, and then processing the coded word and the absolute position to obtain a word vector;
the feature extraction unit is used for performing dimension-increasing processing on the word vectors, extracting features through the transformer module and outputting the features of the word vectors;
the compression unit is used for compressing the characteristics of the word vectors.
Further, the transformers are composed of a plurality of pairs of transformer sets which are sequentially connected in series, wherein each pair of transformer sets is composed of two layers of transformers, and the transformer parameters of each group are completely shared.
The invention also provides a semantic similarity calculation method, which comprises the following steps:
receiving an input sentence;
the neural network searches key sentences and related contents according to input sentences, respectively extracts features and calculates semantic similarity;
and returning the first N sentences with the highest semantic similarity scores.
The invention also provides a semantic retrieval system, which comprises an acquisition module, a processing module and an output module, wherein:
the acquisition module is used for receiving input sentences;
the processing module is used for processing the input sentences;
the output module is used for returning the processing result of the processing module.
The present invention also provides a computer-readable storage medium, in which instructions or a program are stored, and the instructions or the program are loaded and executed by a processor to implement the semantic similarity calculation method.
The present invention also provides an electronic device comprising: the semantic similarity calculation method comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the semantic similarity calculation method.
In practical applications, the modules in the method and system disclosed by the present invention may be deployed on one target server, or each module may be deployed on a different target server independently, and particularly, in order to provide a stronger computing processing capability according to needs, the modules may also be deployed on a cluster target server according to needs.
Therefore, the neural network training method adopted by the technical scheme has the advantages of less neural network parameters, high retrieval speed and capability of accurately calculating the semantic similarity, so that the semantic information of the text can be obtained, the retrieval accuracy is improved, and the method is suitable for high-concurrency and low-delay requirement scenes.
In order that the invention may be more clearly and fully understood, specific embodiments thereof are described in detail below with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic structural diagram of a neural network according to an embodiment of the present application.
Detailed Description
The application discloses a neural network training method aiming at technical defects in the prior art, and the method comprises the following steps:
s1, labeling the similarity of the two sentences;
s2, inputting the two marked sentences into a neural network to obtain the characteristics of each character in the sentences;
s3, averaging the characteristics of each word to obtain the characteristics of two sentences S1 and S2, and calculating a loss function;
and S4, training the neural network according to the loss value of the loss function.
The technical solution of the present application will be further described below with reference to various preferred embodiments.
S1, labeling the similarity of the two sentences;
the purpose of this step is to transform the input sentence into a sentence format that can be processed for training the neural network. The format of two sentences is marked as follows: sentenceA sentenceB Score, namely a sentence and a numerical value of similarity, two sentences and similarity labels are input, the labels are represented by 1, 2,3,4 and 5 points according to the similarity, and the higher the Score is, the more similar the sentences are, such as the following groups of sentences, and the similarity label value is marked according to the meaning similarity:
the weather is very good today, and much better today is No. 5
Today's weather is very good and today's weather is clear 4
Very good weather today and bad weather today 1
S2, inputting the two marked sentences into a neural network to obtain the characteristics of each character in the sentences;
after the sentence is marked, the marked data set can be input into the neural network for training.
Referring to fig. 1, as a preferred embodiment of neural network training, the neural network structure constructed in the present application is a neural network structure obtained by performing an improvement based on albert, and includes a linear network unit, an embedding unit, a feature extraction unit, and a compression unit, where:
the linear network unit is used for copying a group of input variables into three parts as input, and obtaining query, key and value of the obtained sentence through a linear network; the linear network may be a linear layer or a plurality of linear layers.
In the embodiment of the present application, after extracting query, key, and value, a more preferred implementation is proposed, that is, an attention adding mechanism, including the following steps:
firstly, multiplying a matrix formed by query and key;
dividing the number of the groups for normalization;
removing the filling part of the boundary;
obtaining the attention coefficient of each word in the sentence through a softmax activation function;
the information attention coefficient of each word is subjected to drop part feature processing, namely dropout processing, so that the calculation amount of training can be reduced;
then multiplying with a value matrix;
and finally, restoring the multiple groups into one group, so that the relationship between each word can be better obtained.
The embedded unit is used for coding the input word, the absolute position and the number of the belonged paragraph, and then processing the coded word and the absolute position to obtain a word vector; in this embodiment, an absolute position code is used, vectorization processing is performed on an input word (word _ id), the absolute position code of the word, and a sentence or paragraph to which the word belongs according to the sequence, in this embodiment, 128 dimensions are adopted, the information is subjected to superposition processing, normalization processing and dropout processing, and then, an embedded matrix and a mask vector with boundaries removed are obtained.
The feature extraction unit is configured to perform dimension-up processing on the word vector, extract features through the transform module, and output features of the word vector.
The compression unit is used for compressing the features of the word vectors, and the embodiment uses an average posing scheme to process, and takes the average value of each word feature vector as the features of the whole sentence.
S3, averaging the characteristics of each word to obtain the characteristics of two sentences S1 and S2, and calculating a loss function;
the compression unit obtains the features S1, S2 of two sentences by compressing the word features and using the mean value of each word feature vector as the features of the whole sentence, and then can be used to calculate the loss function, as a preferred embodiment, the method for calculating the loss function includes:
the characteristics S1 and S2 of the two sentences are subjected to cosine similarity processing, and the formula is as follows: sim is cos (S1, S2), and cos is cosine similarity.
Then dividing the labels with similarity scores of 1-5 by 5 respectively, and normalizing to 0-1 to obtain normalized labels label;
calculating a loss function by sim and label, wherein the loss function formula in the embodiment is as follows:
Loss=|sim-label|
label is the similarity of two sentences sentenceA and sentenceB labeled manually.
And S4, training the neural network according to the loss value of the loss function, wherein the specific training or optimizing mode can be realized by adopting various existing algorithms.
Based on the training method of the neural network, the obtained neural network can be used for prediction (retrieval). As an embodiment, a semantic similarity calculation method provided by the present application includes:
receiving an input sentence;
the neural network searches key sentences and related contents according to input sentences, respectively extracts features and calculates semantic similarity;
and returning the first N sentences with the highest semantic similarity scores.
Wherein, the used neural network is the neural network obtained by the training method.
Based on the semantic similarity calculation method, the application also provides a semantic retrieval system, which comprises an acquisition module, a processing module and an output module, wherein:
the acquisition module is used for receiving input sentences;
the processing module is used for processing the input sentences;
the output module is used for returning the processing result of the processing module.
The embodiment of the present application further provides a computer-readable storage medium, where instructions or a program are stored in the storage medium, and the instructions or the program are loaded by a processor and execute any one of the semantic similarity calculation methods described above.
An embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate with each other through the bus, and the processor executes the machine-readable instructions to execute the semantic similarity calculation method according to any one of the above methods.
It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A neural network training method is characterized by comprising the following steps:
s1, labeling the similarity of the two sentences;
s2, inputting the two marked sentences into a neural network to obtain the characteristics of each character in the sentences;
s3, averaging the characteristics of each word to obtain the characteristics of two sentences S1 and S2, and calculating a loss function;
and S4, training the neural network according to the loss value of the loss function.
2. The neural network training method of claim 1, wherein in step S1, the sentence similarity is labeled as: sentenceA sentenceB Score.
3. The neural network training method of claim 1, wherein the calculating of the loss function in step S3 includes:
calculating the similarity of complementary strings of S1 and S2, namely sim ═ cos (S1, S2);
dividing the labels of 1-5 points by 5 respectively, and normalizing to be between 0 and 1 to obtain normalized labels label;
the loss function is calculated from sim and label.
4. The neural network training method of claim 3, wherein the loss function is formulated as:
Loss=|sim-label|
where sim is cos (S1, S2), and cos is cosine similarity.
5. The neural network training method of claim 1, wherein the neural network structure comprises a linear network unit, an embedding unit, a feature extraction unit, and a compression unit, wherein:
the linear network unit is used for copying an input variable into three parts as input to respectively obtain the query, the key and the value of a sentence;
the embedded unit is used for coding the input word, the absolute position and the number of the belonged paragraph, and then processing the coded word and the absolute position to obtain a word vector;
the feature extraction unit is used for performing dimension-increasing processing on the word vectors, extracting features and outputting the features of the word vectors;
the compression unit is used for compressing the characteristics of the word vectors.
6. The neural network training method of claim 5, wherein the fransformers are a plurality of pairs of fransformer groups connected in series, wherein each pair of fransformer groups is composed of two layers of fransformers, and the fransformer parameters of each group are completely shared.
7. A semantic similarity calculation method is characterized by comprising the following steps:
receiving an input sentence;
the neural network searches key sentences and related contents according to input sentences, respectively extracts features and calculates semantic similarity;
and returning the first N sentences with the highest semantic similarity scores.
8. A semantic retrieval system is characterized by comprising an acquisition module, a processing module and an output module, wherein:
the acquisition module is used for receiving input sentences;
the processing module is used for processing the input sentences;
the output module is used for returning the processing result of the processing module.
9. A computer-readable storage medium, in which instructions or a program are stored, which are loaded and executed by a processor to implement the semantic similarity calculation method according to claim 7.
10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the semantic similarity calculation method according to claim 7.
CN202210311749.2A 2022-03-28 2022-03-28 Neural network training method, semantic similarity calculation method and semantic retrieval system Pending CN114662668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210311749.2A CN114662668A (en) 2022-03-28 2022-03-28 Neural network training method, semantic similarity calculation method and semantic retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210311749.2A CN114662668A (en) 2022-03-28 2022-03-28 Neural network training method, semantic similarity calculation method and semantic retrieval system

Publications (1)

Publication Number Publication Date
CN114662668A true CN114662668A (en) 2022-06-24

Family

ID=82032797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210311749.2A Pending CN114662668A (en) 2022-03-28 2022-03-28 Neural network training method, semantic similarity calculation method and semantic retrieval system

Country Status (1)

Country Link
CN (1) CN114662668A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840645A (en) * 2022-07-04 2022-08-02 北京邮电大学 Text semantic retrieval method and device for scientific and technological resource information of expert and scholars

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840645A (en) * 2022-07-04 2022-08-02 北京邮电大学 Text semantic retrieval method and device for scientific and technological resource information of expert and scholars

Similar Documents

Publication Publication Date Title
CN110083831B (en) Chinese named entity identification method based on BERT-BiGRU-CRF
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110377903B (en) Sentence-level entity and relation combined extraction method
CN110427461B (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN111460807A (en) Sequence labeling method and device, computer equipment and storage medium
CN110825857B (en) Multi-round question and answer identification method and device, computer equipment and storage medium
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN113722512A (en) Text retrieval method, device and equipment based on language model and storage medium
CN113849661A (en) Entity embedded data extraction method and device, electronic equipment and storage medium
CN114662668A (en) Neural network training method, semantic similarity calculation method and semantic retrieval system
CN112182151B (en) Reading understanding task identification method and device based on multiple languages
CN116226357B (en) Document retrieval method under input containing error information
CN112613293A (en) Abstract generation method and device, electronic equipment and storage medium
CN115730590A (en) Intention recognition method and related equipment
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
CN112507388B (en) Word2vec model training method, device and system based on privacy protection
CN114742045A (en) Semantic text similarity calculation method and device and storage medium
CN114722774A (en) Data compression method and device, electronic equipment and storage medium
CN115270900A (en) User intention identification method and device, electronic equipment and storage medium
CN114398903A (en) Intention recognition method and device, electronic equipment and storage medium
CN111581332A (en) Similar judicial case matching method and system based on triple deep hash learning
CN110909547A (en) Judicial entity identification method based on improved deep learning
CN116975298B (en) NLP-based modernized society governance scheduling system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221216

Address after: Room 2703, No. 277, Xingang East Road, Haizhu District, Guangzhou, Guangdong 510220

Applicant after: Guangzhou Tianchen Health Technology Co.,Ltd.

Address before: Building 10, No. 860, Xinyang Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 200120

Applicant before: Shanghai Yikangyuan Medical Health Technology Co.,Ltd.