WO2021164199A1 - Procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple, et dispositif - Google Patents

Procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple, et dispositif Download PDF

Info

Publication number
WO2021164199A1
WO2021164199A1 PCT/CN2020/104723 CN2020104723W WO2021164199A1 WO 2021164199 A1 WO2021164199 A1 WO 2021164199A1 CN 2020104723 W CN2020104723 W CN 2020104723W WO 2021164199 A1 WO2021164199 A1 WO 2021164199A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
word
character
matching
vector
Prior art date
Application number
PCT/CN2020/104723
Other languages
English (en)
Chinese (zh)
Inventor
鹿文鹏
王荣耀
张旭
贾瑞祥
郭韦钰
张维玉
Original Assignee
齐鲁工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 齐鲁工业大学 filed Critical 齐鲁工业大学
Publication of WO2021164199A1 publication Critical patent/WO2021164199A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Definitions

  • the invention relates to the field of artificial intelligence and natural language processing, in particular to a method and device for intelligently matching Chinese sentence semantics based on a multi-granularity fusion model.
  • Sentence semantic matching plays a key role in many natural language processing tasks, such as question answering (QA), natural language inference (NLI), machine translation (MT), and so on.
  • the key to sentence semantic matching is to calculate the degree of matching between the semantics of a given sentence pair.
  • Sentences can be segmented at different granularities, such as characters, words, and phrases.
  • the commonly used text segmentation granularity is words, especially in the Chinese field.
  • the patent document with the patent number CN106569999A discloses a multi-granularity short text semantic similarity comparison method, which includes the following steps: S1, short text preprocessing; the preprocessing includes Chinese word segmentation and part-of-speech tagging; S2. Feature selection is performed on the preprocessed short text; S3, distance measurement is performed on the vector set after feature selection to determine the similarity of the short text.
  • S1 short text preprocessing
  • the preprocessing includes Chinese word segmentation and part-of-speech tagging
  • S2 Feature selection is performed on the preprocessed short text
  • S3 distance measurement is performed on the vector set after feature selection to determine the similarity of the short text.
  • the technical task of the present invention is to provide a Chinese sentence semantic intelligent matching method and device based on a multi-granularity fusion model to solve the problems of incomplete semantic analysis of a single-granularity model and imprecise sentence matching.
  • the intelligent matching method for Chinese sentence semantics based on a multi-granularity fusion model is specifically as follows:
  • S303 Construct a multi-granularity embedding layer: perform vector mapping on words and characters in the sentence to obtain word-level sentence vectors and character-level sentence vectors;
  • S304 Construct a multi-granularity fusion coding layer: perform coding processing on word-level sentence vectors and character-level sentence vectors to obtain sentence semantic feature vectors;
  • S305 Construct an interactive matching layer: perform hierarchical comparison of sentence semantic feature vectors to obtain matching representation vectors of sentence pairs;
  • the construction of the text matching knowledge base in step S1 is specifically as follows:
  • Preprocess the original data preprocess the similar text in the original similar sentence knowledge base, and perform word segmentation and hyphenation processing on each sentence to obtain a text matching knowledge base; wherein, the word segmentation processing is based on each word in Chinese Unit, perform word segmentation operations on each piece of data; hyphenation processing is based on each character in Chinese as the basic unit, and each piece of data is hyphenated; each Chinese character and word is divided by a space, and each word is reserved. All contents including numbers, punctuation and special characters in the data;
  • the training data set for constructing the text matching model in the step S2 is specifically as follows:
  • Q1-char represents sentence 1 at character level granularity
  • Q1-word represents sentence 1 at word level granularity
  • Q2-char represents sentence 2 at character level granularity
  • Q2-word represents sentence 2 at word level granularity
  • 1 represents sentence 1 It matches the two texts of sentence 2, which is a positive example
  • Q1-char represents sentence 1 at character level granularity
  • Q1-word represents sentence 1 at word level granularity
  • Q2-char represents sentence 2 at character level granularity
  • Q2-word represents sentence 2 at word level granularity
  • 0 represents sentence Q1 It does not match the two texts of sentence Q2, which is a negative example
  • step S203 Construct a training data set: Combine all the positive samples and negative samples obtained after the operations of step S201 and step S202, and disrupt their order to construct a final training data set; among them, whether it is positive data
  • Still negative data includes five dimensions, namely Q1-char, Q1-word, Q2-char, Q2-word, 0 or 1.
  • step S301 the construction of the character word mapping conversion table in step S301 is specifically as follows:
  • the character vocabulary table is constructed by the text matching knowledge base obtained after preprocessing;
  • each character and word in the table is mapped to a unique digital identifier.
  • the mapping rule is: start with the number 1, and then follow the order in which each character and word is entered into the character vocabulary table Sort in ascending order to form a character-word mapping conversion table;
  • the construction of the input layer in the step S302 is specifically as follows:
  • the input layer includes four inputs.
  • the two sentences to be matched are preprocessed to obtain Q1-char, Q1-word, Q2-char, and Q2-word respectively, which are formalized as: (Q1-char, Q1- word,Q2-char,Q2-word);
  • step S303 the construction of the multi-granularity embedding layer in step S303 is specifically as follows:
  • the word-level sentence vector and character-level sentence vector Q1-word Emd, Q1-char Emd, Q2-word Emd, Q2-char Emd are obtained after multi-granular embedding layer processing; among them, text
  • Each sentence in the matching knowledge base can transform text information into vector form through character word vector mapping;
  • the construction of the multi-granularity fusion coding layer in step S304 takes the word-level sentence vector and character-level sentence vector output by the multi-granularity embedding layer in step S303 as input, and obtains text semantic features from two perspectives, namely, character-level semantic feature extraction and Word-level semantic feature extraction; then through the form of bitwise addition, the text semantic features from the two perspectives are integrated to obtain the final sentence semantic feature vector; for sentence Q1, the final sentence semantic feature vector is obtained as follows:
  • i represents the relative position of the corresponding character vector in the sentence
  • Q i is the corresponding vector representation of each character in the sentence Q1
  • Q′ i is the corresponding vector representation of each character after the initial LSTM encoding
  • Q′′ i is the corresponding vector representation of each character after the initial LSTM encoding.
  • i ' represents the relative position of the corresponding word in the sentence vector;
  • Q i' expressed as Q1 respective sentence vectors of each word;
  • Q 'i' expressed as a respective vector for each of words after the initial coding LSTM;
  • Q " i′ is the corresponding vector representation of each word after the second LSTM encoding;
  • step S30403 After step S30401 and step S30402, the feature vector of the corresponding character level is obtained And word-level feature vectors will with Add bitwise to get the final sentence semantic feature vector for text Q1
  • the formula is as follows:
  • step S30401 For sentence Q2, obtain the final sentence semantic feature vector
  • step S305 the construction of the interactive matching layer in step S305 is specifically as follows:
  • step S30501 After processing in step S304, the sentence semantic feature vectors of Q1 and Q2 are obtained with against with Perform three operations of subtraction, cross product, and dot product to get The formula is as follows:
  • dot product also called the quantified product
  • the result is the length of a vector projected in the direction of another vector, which is a scalar
  • cross product also called the vector product
  • the result is a vector that is perpendicular to the two existing vectors
  • i represents the relative position of the corresponding semantic feature in the sentence;
  • Q1 i is the text Q1 obtained by feature extraction in step S304
  • Each respective feature vector is the semantic representation;
  • I Q2 Q2 through the text feature extraction step S304 obtained The corresponding vector representation of each semantic feature in;
  • For sentence semantic feature vector with Use Dense to further extract the feature vector obtained; Indicates that the coding dimension is 300;
  • step S30504 will The result after a layer of fully connected layer encoding is the same as in step S30501 Sum, get the matching representation vector of the sentence pair
  • the formula is as follows:
  • step S306 The construction of the prediction layer in step S306 is specifically as follows:
  • the prediction layer receives the matching representation vector output in step S305, and uses the Sigmoid function for calculation to obtain a matching degree representation y pred between [0, 1];
  • the training of the multi-granularity fusion model in step S4 is specifically as follows:
  • y true represents the true label, that is, the 0 and 1 flags that indicate match or not in each training example;
  • y pred represents the prediction result;
  • the use of balanced cross entropy can automatically balance the positive and negative samples and improve the accuracy of classification; it fuses the cross entropy with the mean square error, and the formula is as follows:
  • a Chinese sentence semantic intelligent matching device based on a multi-granularity fusion model comprising:
  • the text matching knowledge base building unit is used to use crawlers to crawl question sets on the Internet public question and answer platform, or use text matching data sets published on the Internet as the original similar sentence knowledge base, and then preprocess the original similar sentence knowledge base ,
  • the main operation is to perform hyphenation and word segmentation on each sentence in the original similar sentence knowledge base, so as to construct a text matching knowledge base for model training;
  • the training data set generation unit is used to construct training positive example data and training negative example data according to the sentences in the text matching knowledge base, and construct the final training data set based on the positive example data and the negative example data;
  • the multi-granularity fusion model building unit is used to construct the character word mapping conversion table, and to construct the input layer, the multi-granularity embedding layer, the multi-granularity fusion coding layer, the interactive matching layer, and the prediction layer at the same time; among them, the multi-granularity fusion model building unit includes:
  • the character word mapping conversion table constructs a subunit, which is used to segment each sentence in the text matching knowledge base according to characters and words, and store each character and word in a list in turn to obtain a character word table. Subsequently, starting with the number 1, the characters and words are entered in the character word table in ascending order, thereby forming the character word mapping conversion table required by the present invention; after the character word mapping conversion table is constructed, each character in the table Characters and words are mapped to unique digital identifiers; after that, use Word2Vec to train the character word vector model to obtain the weight of the character word vector matrix;
  • the input layer constructs sub-units, which are used to convert each character and word in the input sentence into a corresponding digital identifier according to the character-word mapping conversion table, thereby completing the data input, specifically to obtain q1 and q2 respectively, and convert them Formalized as: (q1-char,q1-word,q2-char,q2-word);
  • the multi-granular embedding layer constructs sub-units, which are used to load pre-trained character word vector weights, convert the character words in the input sentence into the character word vector form, and then form a complete sentence vector representation; this operation is based on the digital identification of the character words Completed by looking up the character word vector matrix;
  • the multi-granularity fusion coding layer constructs sub-units, which are used to take the word-level sentence vector and character-level sentence vector output by the multi-granularity embedding layer as input; first obtain text semantic features from two perspectives, namely, character-level semantic feature extraction and word-level Semantic feature extraction; then through the form of bitwise addition, the semantic features of the text from the two perspectives are integrated to obtain the final sentence semantic feature vector;
  • the interactive matching layer constructs sub-units, which are used to perform hierarchical matching calculations on the semantic feature vectors of the two input sentences to obtain the matching representation vectors of the sentence pairs;
  • the prediction layer constructs a sub-unit for receiving the matching representation vector output by the interactive matching layer, using the Sigmoid function to calculate, and obtaining the matching degree between [0,1], and finally judging the sentence pair by comparing with the established threshold Matching degree;
  • the multi-granularity fusion model training unit is used to construct the loss function needed in the model training process and complete the optimization training of the model.
  • the text matching knowledge base building unit includes:
  • the original data processing subunit is used to hyphenate and segment the sentences in the original similar sentence knowledge base, thereby constructing the text matching knowledge base for model training;
  • the training data set generating unit includes:
  • the training positive example data construction subunit is used to combine semantically matched sentences in the text matching knowledge base, and add matching label 1 to it to construct the training positive example data;
  • the training negative example data constructs a subunit, which is used to first select a sentence q 1 from the text matching knowledge base, and then randomly select a sentence q 2 that does not match the sentence q 1 semantically from the text matching knowledge base, and compare q 1 with q 2 Combine and add a matching label 0 to it, and construct it as training negative example data;
  • the training data set construction subunit is used to combine all the training positive example data and the training negative example data, and disrupt the order to construct the final training data set;
  • the multi-granularity fusion model training unit includes:
  • the loss function construction subunit is used to construct the loss function and calculate the error of the text matching degree between sentence 1 and sentence 2;
  • the model optimization training subunit is used to train and adjust the parameters in the model training, thereby reducing the error between the predicted matching degree between sentence 1 and sentence 2 and the true matching degree in the model training process.
  • a storage medium stores a plurality of instructions, and the instructions are loaded by a processor to execute the steps of the above-mentioned intelligent matching method for Chinese sentence semantics based on a multi-granularity fusion model.
  • An electronic device which includes:
  • the processor is configured to execute instructions in the storage medium.
  • the present invention integrates word vectors and character vectors, and effectively extracts the semantic information of Chinese sentences from the two granularities of characters and words, thereby improving the accuracy of Chinese sentence coding;
  • the present invention can accurately realize the task of matching Chinese sentences
  • the present invention uses mean square error (MSE) as a balance factor to improve the cross-entropy loss function, thereby designing a balanced cross-entropy loss function; the loss function can solve the problem of overfitting, and the classification boundary is processed during the training process. Fuzzy processing; at the same time, it can alleviate the problem of category imbalance between positive and negative samples;
  • MSE mean square error
  • the multi-granularity fusion model uses different encoding methods to generate character-level sentence vectors and word-level sentence vectors; for word-level sentence vectors, two LSTM networks are used for sequential encoding, and then the attention mechanism is used for depth Feature extraction; for character-level sentence vectors, in addition to using the same processing method as word-level sentence vectors, a layer of LSTM network and attention mechanism are added for encoding; the encoding of word-level sentence vectors and character-level sentence vectors are finally superimposed Together, as a multi-granularity fusion coding representation of a sentence, it can make the coding representation of a sentence more accurate and comprehensive;
  • the present invention uses mean square error (MSE) as a balance factor to improve the cross-entropy loss function.
  • MSE mean square error
  • LCQMC public data set
  • the present invention realizes a multi-granularity fusion model, which considers both Chinese word-level granularity and character-level granularity, and integrates multi-granularity coding to better capture semantic features.
  • Figure 1 is a flow chart of a Chinese sentence semantic intelligent matching method based on a multi-granularity fusion model
  • Figure 2 is a block diagram of the process of constructing a text matching knowledge base
  • Figure 3 is a block diagram of the process of constructing the training data set of the text matching model
  • Figure 4 is a block diagram of the process of constructing a multi-granularity fusion model
  • Figure 5 is a block diagram of the process of training a multi-granularity fusion model
  • Figure 6 is a schematic diagram of a multi-granularity fusion model
  • Figure 7 is a schematic diagram of a multi-granularity embedding layer
  • Fig. 8 is a schematic diagram of a multi-granularity fusion coding layer
  • Fig. 9 is a schematic diagram of an interactive matching layer
  • Fig. 10 is a block diagram of a device for intelligent semantic matching of Chinese sentences based on a multi-granularity fusion model.
  • the intelligent matching method for Chinese sentence semantics based on the multi-granularity fusion model of the present invention is specifically as follows:
  • the text matching data set publicly available on the Internet as the original knowledge base.
  • the LCQMC data set [Liu,X.,Chen,Q.,Deng,C.,Zeng,H.,Chen,J.,Li,D.,Tang,B.:LCQMC:A large-scale Chinese question matching corpus .In:Proceedings of the 27th International Conference on Computational Linguistics.pp.1952-1962(2018)]
  • this data set has a total of 260,068 pairs of annotation results, divided into three parts: 238766 training set, 8802 verification set and 12500 test set. It is a Chinese data set specially used for text matching tasks.
  • Preprocess the original data preprocess the similar texts in the original similar sentence knowledge base, and perform word segmentation and hyphenation processing on each sentence to obtain a text matching knowledge base;
  • step S101 The similar text obtained in step S101 is preprocessed to obtain a text matching knowledge base.
  • step S102 in order to avoid the loss of semantic information, the present invention reserves all stop words in the sentence.
  • the word segmentation processing takes each word in Chinese as the basic unit, and performs word segmentation operations on each piece of data; for example, take the sentence 2 "Can you apply for a one-day extension of repayment?" shown in step S101 as an example. After the word segmentation is processed, you get "Can you apply for a one-day extension of repayment?".
  • the present invention records the sentence after word segmentation processing as a sentence with word level granularity.
  • Hyphenation processing takes each Chinese character as the basic unit, and performs hyphenation operations on each piece of data; each Chinese character is divided by spaces, and the numbers, punctuation and special characters included in each piece of data are kept in For example, take the sentence 2 "Can you apply for a one-day repayment extension?" shown in step S101 as an example, after hyphenating it, you get "Can you apply for a one-day extension of repayment?".
  • the present invention records the hyphenated sentence as a sentence with character level granularity.
  • Q1-char represents sentence 1 at character level granularity
  • Q1-word represents sentence 1 at word level granularity
  • Q2-char represents sentence 2 at character level granularity
  • Q2-word represents sentence 2 at word level granularity
  • 1 represents sentence 1 It matches the two texts of sentence 2, which is a positive example
  • Q1-char represents sentence 1 at character level granularity
  • Q1-word represents sentence 1 at word level granularity
  • Q2-char represents sentence 2 at character level granularity
  • Q2-word represents sentence 2 at word level granularity
  • 0 represents sentence Q1 It does not match the two texts of sentence Q2, which is a negative example
  • step S203 Construct a training data set: Combine all the positive samples and negative samples obtained after the operations of step S201 and step S202, and disrupt their order to construct a final training data set; among them, whether it is positive data
  • Still negative data includes five dimensions, namely Q1-char, Q1-word, Q2-char, Q2-word, 0 or 1.
  • the core of the present invention is a multi-granularity fusion model, which can be divided into four parts: multi-granularity embedding layer, multi-granularity fusion coding layer, interactive matching layer, prediction layer ;
  • multi-granularity embedding layer to perform vector mapping on words and characters in the sentence to obtain word-level sentence vectors and character-level sentence vectors;
  • build a multi-granularity fusion coding layer to encode word-level sentence vectors and character-level sentence vectors
  • the semantic feature vector of the sentence is obtained; then the interactive matching layer is constructed, and the semantic feature vector of the sentence is hierarchically compared to obtain the matching representation vector of the sentence pair; finally, the Sigmoid function of the prediction layer is processed to determine the semantic matching degree of the sentence pair.
  • the details are as follows:
  • the character vocabulary table is constructed by the text matching knowledge base obtained after preprocessing;
  • each character and word in the table is mapped to a unique digital identifier.
  • the mapping rule is: start with the number 1, and then follow the order in which each character or word is entered into the character vocabulary table Sort in ascending order to form a character-word mapping conversion table;
  • embedding_matrix numpy.zeros([len(tokenizer.word_index)+1,embedding_dim])
  • w2v_corpus is the training corpus, that is, all data in the text matching knowledge base; embedding_dim is the dimension of the character word vector, embedding_dim is set to 300 in the present invention, and word_set is the word list.
  • the input layer includes four inputs.
  • the two sentences to be matched are preprocessed to obtain Q1-char, Q1-word, Q2-char, and Q2-word respectively, which are formalized as: (Q1-char, Q1- word,Q2-char,Q2-word);
  • the present invention uses the positive example text displayed in step S201 as an example to form a piece of input data.
  • the result is as follows:
  • mapping in the character vocabulary table the above input data is converted into a numerical representation (assuming that the mappings of characters and words that appear in sentence 2 but not in sentence 1 are "Yes”: 18, “No”: 19, “Apply”: 20, “Please”: 21, “Whether”: 22, “Apply”: 23, “Extension”: 24), the results are as follows:
  • the word-level sentence vector and character-level sentence vector Q1-word Emd, Q1-char Emd, Q2-word Emd, Q2-char Emd are obtained after multi-granular embedding layer processing; among them, text
  • Each sentence in the matching knowledge base can transform text information into a vector form by means of character word vector mapping; embedding_dim is set to 300 in the present invention.
  • embedding_matrix is the weight of the character word vector matrix trained in step S301
  • embedding_matrix.shape[0] is the size of the word table (dictionary) of the character word vector matrix
  • embedding_dim is the dimension of the output character word vector
  • input_length is the input The length of the sequence.
  • the corresponding texts Q1 and Q2 are processed by the multi-granular embedding layer to obtain word-level sentence vectors and character-level sentence vectors Q1-word Emd, Q1-char Emd, Q2-word Emd, Q2-char Emd.
  • step S304 Construct a multi-granularity fusion coding layer: as shown in FIG. 8, the word-level sentence vector and character-level sentence vector are coded to obtain the sentence semantic feature vector; the construction of the multi-granularity fusion coding layer in step S304 is the step S303
  • the word-level sentence vector and character-level sentence vector output by the multi-granular embedding layer are used as input to obtain text semantic features from two perspectives, namely, character-level semantic feature extraction and word-level semantic feature extraction;
  • the text semantic features from the two perspectives are integrated to obtain the final sentence semantic feature vector; for the sentence Q1, the final sentence semantic feature vector is obtained as follows:
  • i represents the relative position of the corresponding character vector in the sentence
  • Q i is the corresponding vector representation of each character in the sentence Q1
  • Q′ i is the corresponding vector representation of each character after the initial LSTM encoding
  • Q′′ i is the corresponding vector representation of each character after the initial LSTM encoding.
  • i ' represents the relative position of the corresponding word in the sentence vector;
  • Q i' expressed as Q1 respective sentence vectors of each word;
  • Q 'i' expressed as a respective vector for each of words after the initial coding LSTM;
  • Q " i′ is the corresponding vector representation of each word after the second LSTM encoding;
  • step S30403 After step S30401 and step S30402, the feature vector of the corresponding character level is obtained And word-level feature vectors
  • the coding dimension of the present invention is uniformly set to 300, and the present invention sets with Add bitwise to get the final sentence semantic feature vector for text Q1
  • the formula is as follows:
  • step S30401 For sentence Q2, obtain the final sentence semantic feature vector
  • step S30501 After processing in step S304, the sentence semantic feature vectors of Q1 and Q2 are obtained with against with Perform three operations of subtraction, cross product, and dot product to get The formula is as follows:
  • dot product also called the quantified product
  • the result is the length of a vector projected in the direction of another vector, which is a scalar
  • cross product also called the vector product
  • the result is a vector that is perpendicular to the two existing vectors
  • i represents the relative position of the corresponding semantic feature in the sentence;
  • Q1 i is the text Q1 obtained by feature extraction in step S304
  • Each respective feature vector is the semantic representation;
  • I Q2 Q2 through the text feature extraction step S304 obtained The corresponding vector representation of each semantic feature in;
  • For sentence semantic feature vector with Use Dense to further extract the feature vector obtained; Indicates that the coding dimension is 300;
  • step S30504 will The result after a layer of fully connected layer encoding is the same as in step S30501 Sum, get the matching representation vector of the sentence pair
  • the formula is as follows:
  • the prediction layer receives the matching representation vector output in step S305, and uses the Sigmoid function for calculation to obtain a matching degree representation y pred between [0, 1];
  • y true represents the true label, that is, the 0 and 1 flags that indicate match or not in each training example;
  • y pred represents the prediction result;
  • the use of balanced cross entropy can automatically balance the positive and negative samples and improve the accuracy of classification; it fuses the cross entropy with the mean square error, and the formula is as follows:
  • the present invention designs a cross-entropy loss function to prevent over-fitting problems.
  • cross entropy is a common loss function for training models.
  • the method based on maximum likelihood estimation will generate input noise. This method may divide the training sample into 0 or 1, leading to the problem of overfitting.
  • the present invention proposes to use mean square error (MSE) as a balance parameter to balance positive samples and negative samples, thereby greatly improving the performance of the model.
  • MSE mean square error
  • model keras.models.Model([Q1-char,Q1-word,Q2-char,Q2-word],[y pred ])
  • the loss function loss selects the custom Loss in this step S401; the optimization algorithm optimizer selects the previously defined optim; Q1-char, Q1-word, Q2-char, and Q2-word are the model inputs, and y pred is the model output; evaluation Indicator metrics, the present invention selects accuracy, precision, recall, F 1 -score calculated based on recall and precision.
  • the model of the present invention has achieved better results than the current model on the LCQMC public data set.
  • the comparison of the experimental results is shown in the following table:
  • the first fourteen lines are the experimental results of the prior art model [Liu, X., Chen, Q., Deng, C., Zeng, H., Chen, J., Li, D., Tang, B. ,2018.Lcqmc:A large-scale chinese question matching corpus,in:Proceedings of the 27th International Conference on Computational Linguistics,pp.1952–1962]. Comparing the model of the present invention with the existing model, it can be seen that the method of the present invention has the best performance compared with other methods.
  • the intelligent matching device for Chinese sentence semantics based on the multi-granularity fusion model of the present invention includes:
  • the text matching knowledge base building unit is used to use crawlers to crawl question sets on the Internet public question and answer platform, or use the text matching data set published on the Internet as the original similar sentence knowledge base, and then preprocess the original similar sentence knowledge base ,
  • the main operation is to perform hyphenation and word segmentation on each sentence in the original similar sentence knowledge base, thereby constructing a text matching knowledge base for model training;
  • the text matching knowledge base building unit includes:
  • the original data processing subunit is used to hyphenate and segment the sentences in the original similar sentence knowledge base, thereby constructing the text matching knowledge base for model training;
  • the training data set generation unit is used to construct training positive example data and training negative example data according to the sentences in the text matching knowledge base, and build the final training data set based on the positive and negative example data; training data set generation unit include,
  • the training positive example data construction subunit is used to combine semantically matched sentences in the text matching knowledge base, and add matching label 1 to it to construct the training positive example data;
  • the training negative example data constructs a subunit, which is used to first select a sentence q 1 from the text matching knowledge base, and then randomly select a sentence q 2 that does not match the sentence q 1 semantically from the text matching knowledge base, and compare q 1 with q 2 Combine and add a matching label 0 to it, and construct it as training negative example data;
  • the training data set construction subunit is used to combine all the training positive example data and the training negative example data, and disrupt the order to construct the final training data set;
  • the multi-granularity fusion model building unit is used to construct the character word mapping conversion table, and to construct the input layer, the multi-granularity embedding layer, the multi-granularity fusion coding layer, the interactive matching layer, and the prediction layer at the same time; among them, the multi-granularity fusion model building unit includes:
  • the character word mapping conversion table constructs a subunit, which is used to segment each sentence in the text matching knowledge base according to characters and words, and store each character and word in a list in turn to obtain a character word table. Subsequently, starting with the number 1, the characters and words are entered in the character word table in ascending order, thereby forming the character word mapping conversion table required by the present invention; after the character word mapping conversion table is constructed, each character in the table Characters and words are mapped to unique digital identifiers; after that, use Word2Vec to train the character word vector model to obtain the weight of the character word vector matrix;
  • the input layer constructs sub-units, which are used to convert each character and word in the input sentence into a corresponding digital identifier according to the character-word mapping conversion table, thereby completing the data input, specifically to obtain q1 and q2 respectively, and convert them Formalized as: (q1-char,q1-word,q2-char,q2-word);
  • the multi-granular embedding layer constructs sub-units, which are used to load pre-trained character word vector weights, convert the character words in the input sentence into the character word vector form, and then form a complete sentence vector representation; this operation is based on the digital identification of the character words Completed by looking up the character word vector matrix;
  • the multi-granularity fusion coding layer constructs sub-units, which are used to take the word-level sentence vector and character-level sentence vector output by the multi-granularity embedding layer as input; first obtain text semantic features from two perspectives, namely, character-level semantic feature extraction and word-level Semantic feature extraction; then through the form of bitwise addition, the semantic features of the text from the two perspectives are integrated to obtain the final sentence semantic feature vector;
  • the interactive matching layer constructs sub-units, which are used to perform hierarchical matching calculations on the semantic feature vectors of the two input sentences to obtain the matching representation vectors of the sentence pairs;
  • the prediction layer constructs a sub-unit for receiving the matching representation vector output by the interactive matching layer, using the Sigmoid function to calculate, and obtaining the matching degree between [0,1], and finally judging the sentence pair by comparing with the established threshold Matching degree;
  • the multi-granular fusion model training unit is used to construct the loss function needed in the model training process and complete the optimization training of the model; the multi-granular fusion model training unit includes:
  • the loss function construction subunit is used to construct the loss function and calculate the error of the text matching degree between sentence 1 and sentence 2;
  • the model optimization training subunit is used to train and adjust the parameters in the model training, thereby reducing the error between the predicted matching degree between sentence 1 and sentence 2 and the true matching degree in the model training process.
  • the device for intelligent semantic matching of Chinese sentences based on the multi-granularity fusion model shown in FIG. 10 can be integrated and deployed in various hardware devices, such as personal computers, workstations, and smart mobile devices.
  • a plurality of instructions are stored therein, and the instructions are loaded by the processor to execute the steps of the method for intelligent semantic matching of Chinese sentences based on the multi-granularity fusion model of Embodiment 1.
  • the electronic device includes:
  • the processor is used to execute instructions in the storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

Un procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple et un dispositif sont divulgués, se rapportant au domaine de l'intelligence artificielle et au domaine du traitement automatique du langage naturel. La présente invention aborde les problèmes techniques de l'analyse sémantique non complète et de la mise en correspondance de phrases inexacte de modèles à granularité unique. Le procédé consiste plus précisément à : S1, créer une base de données de connaissances de mise en correspondance de texte; S2, créer un ensemble de données de formation d'un modèle de mise en correspondance de texte; S3, créer un modèle de fusion à granularité multiple, qui consiste plus précisément à : S301, créer une table de conversion de mappage de mots de caractère; S302, créer une couche d'entrée; S303, créer une couche d'incorporation à granularité multiple; S304, créer une couche de codage de fusion à granularité multiple; S305, créer une couche de mise en correspondance d'interaction, et S306, créer une couche de prédiction; et S4, former le modèle de fusion à granularité multiple. Le dispositif comprend une unité de création de base de données de connaissances de mise en correspondance de texte, une unité de création d'ensemble de données de formation pour un modèle de mise en correspondance de texte, une unité de création de modèle de fusion à granularité multiple, et une unité de formation de modèle de fusion à granularité multiple.
PCT/CN2020/104723 2020-02-20 2020-07-27 Procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple, et dispositif WO2021164199A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010103529.1 2020-02-20
CN202010103529.1A CN111310438B (zh) 2020-02-20 2020-02-20 基于多粒度融合模型的中文句子语义智能匹配方法及装置

Publications (1)

Publication Number Publication Date
WO2021164199A1 true WO2021164199A1 (fr) 2021-08-26

Family

ID=71151080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104723 WO2021164199A1 (fr) 2020-02-20 2020-07-27 Procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple, et dispositif

Country Status (2)

Country Link
CN (1) CN111310438B (fr)
WO (1) WO2021164199A1 (fr)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705197A (zh) * 2021-08-30 2021-11-26 北京工业大学 一种基于位置增强的细粒度情感分析方法
CN114153839A (zh) * 2021-10-29 2022-03-08 杭州未名信科科技有限公司 多源异构数据的集成方法、装置、设备及存储介质
CN114218380A (zh) * 2021-12-03 2022-03-22 淮阴工学院 基于多模态的冷链配载用户画像标签抽取方法及装置
CN114238563A (zh) * 2021-12-08 2022-03-25 齐鲁工业大学 基于多角度交互的中文句子对语义智能匹配方法和装置
CN114239566A (zh) * 2021-12-14 2022-03-25 公安部第三研究所 基于信息增强实现两步中文事件精准检测的方法、装置、处理器及其计算机可读存储介质
CN114281987A (zh) * 2021-11-26 2022-04-05 重庆邮电大学 一种用于智能语音助手的对话短文本语句匹配方法
CN114297390A (zh) * 2021-12-30 2022-04-08 江南大学 一种长尾分布场景下的方面类别识别方法及系统
CN114357158A (zh) * 2021-12-09 2022-04-15 南京中孚信息技术有限公司 基于句粒度语义和相对位置编码的长文本分类技术
CN114357121A (zh) * 2022-03-10 2022-04-15 四川大学 一种基于数据驱动的创新方案设计方法和系统
CN114416930A (zh) * 2022-02-09 2022-04-29 上海携旅信息技术有限公司 搜索场景下的文本匹配方法、系统、设备及存储介质
CN114461806A (zh) * 2022-02-28 2022-05-10 同盾科技有限公司 广告识别模型的训练方法及装置、广告屏蔽方法
CN114492451A (zh) * 2021-12-22 2022-05-13 马上消费金融股份有限公司 文本匹配方法、装置、电子设备及计算机可读存储介质
CN114547256A (zh) * 2022-04-01 2022-05-27 齐鲁工业大学 面向消防安全知识智能问答的文本语义匹配方法和装置
CN114595306A (zh) * 2022-01-26 2022-06-07 西北大学 基于距离感知自注意力机制和多角度建模的文本相似度计算系统及方法
CN114742016A (zh) * 2022-04-01 2022-07-12 山西大学 一种基于多粒度实体异构图的篇章级事件抽取方法及装置
CN115048944A (zh) * 2022-08-16 2022-09-13 之江实验室 一种基于主题增强的开放域对话回复方法及系统
CN115238684A (zh) * 2022-09-19 2022-10-25 北京探境科技有限公司 一种文本采集方法、装置、计算机设备及可读存储介质
CN115438674A (zh) * 2022-11-08 2022-12-06 腾讯科技(深圳)有限公司 实体数据处理、实体链接方法、装置和计算机设备
CN115600945A (zh) * 2022-09-07 2023-01-13 淮阴工学院(Cn) 基于多粒度的冷链配载用户画像构建方法及装置
CN115910345A (zh) * 2022-12-22 2023-04-04 广东数业智能科技有限公司 一种心理健康测评智能预警方法及存储介质
CN115936014A (zh) * 2022-11-08 2023-04-07 上海栈略数据技术有限公司 一种医学实体对码方法、系统、计算机设备、存储介质
CN116071759A (zh) * 2023-03-06 2023-05-05 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 一种融合gpt2预训练大模型的光学字符识别方法
CN116204642A (zh) * 2023-03-06 2023-06-02 上海阅文信息技术有限公司 数字阅读中角色隐式属性智能识别分析方法、系统和应用
CN116304745A (zh) * 2023-03-27 2023-06-23 济南大学 基于深层次语义信息的文本话题匹配方法及系统
CN116306558A (zh) * 2022-11-23 2023-06-23 北京语言大学 一种计算机辅助中文文本改编的方法及装置
CN116629275A (zh) * 2023-07-21 2023-08-22 北京无极慧通科技有限公司 一种基于大数据的智能决策支持系统及方法
CN116680590A (zh) * 2023-07-28 2023-09-01 中国人民解放军国防科技大学 基于工作说明书解析的岗位画像标签提取方法及装置
CN116822495A (zh) * 2023-08-31 2023-09-29 小语智能信息科技(云南)有限公司 基于对比学习的汉-老、泰平行句对抽取方法及装置
CN117271438A (zh) * 2023-07-17 2023-12-22 乾元云硕科技(深圳)有限公司 用于大数据的智能存储系统及其方法
CN117390141A (zh) * 2023-12-11 2024-01-12 江西农业大学 一种农业社会化服务质量用户评价数据分析方法
CN117556027A (zh) * 2024-01-12 2024-02-13 一站发展(北京)云计算科技有限公司 基于数字人技术的智能交互系统及方法
CN117590944A (zh) * 2023-11-28 2024-02-23 上海源庐加佳信息科技有限公司 实体人对象和数字虚拟人对象的绑定系统
CN117633518A (zh) * 2024-01-25 2024-03-01 北京大学 一种产业链构建方法及系统
CN117669593A (zh) * 2024-01-31 2024-03-08 山东省计算中心(国家超级计算济南中心) 基于等价语义的零样本关系抽取方法、系统、设备及介质
CN117744787A (zh) * 2024-02-20 2024-03-22 中国电子科技集团公司第十研究所 一阶研判规则知识合理性的智能度量方法
CN117874209A (zh) * 2024-03-12 2024-04-12 深圳市诚立业科技发展有限公司 基于nlp的诈骗短信监控告警系统
CN117910460A (zh) * 2024-03-18 2024-04-19 国网江苏省电力有限公司南通供电分公司 一种基于bge模型的电力科研知识关联性构建方法及系统
CN118093791A (zh) * 2024-04-24 2024-05-28 北京中关村科金技术有限公司 结合云计算的ai知识库生成方法及系统
CN118132683A (zh) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 文本抽取模型的训练方法、文本抽取方法和设备

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310438B (zh) * 2020-02-20 2021-06-08 齐鲁工业大学 基于多粒度融合模型的中文句子语义智能匹配方法及装置
CN111753524A (zh) * 2020-07-01 2020-10-09 携程计算机技术(上海)有限公司 文本断句位置的识别方法及系统、电子设备及存储介质
CN111914551B (zh) * 2020-07-29 2022-05-20 北京字节跳动网络技术有限公司 自然语言处理方法、装置、电子设备及存储介质
CN112149410A (zh) * 2020-08-10 2020-12-29 招联消费金融有限公司 语义识别方法、装置、计算机设备和存储介质
CN112000772B (zh) * 2020-08-24 2022-09-06 齐鲁工业大学 面向智能问答基于语义特征立方体的句子对语义匹配方法
CN112101030B (zh) * 2020-08-24 2024-01-26 沈阳东软智能医疗科技研究院有限公司 建立术语映射模型、实现标准词映射的方法、装置及设备
CN112328890B (zh) * 2020-11-23 2024-04-12 北京百度网讯科技有限公司 搜索地理位置点的方法、装置、设备及存储介质
CN112256841B (zh) * 2020-11-26 2024-05-07 支付宝(杭州)信息技术有限公司 文本匹配和对抗文本识别方法、装置及设备
CN112463924B (zh) * 2020-11-27 2022-07-05 齐鲁工业大学 面向智能问答基于内部相关性编码的文本意图匹配方法
CN112560502B (zh) * 2020-12-28 2022-05-13 桂林电子科技大学 一种语义相似度匹配方法、装置及存储介质
CN112613282A (zh) * 2020-12-31 2021-04-06 桂林电子科技大学 一种文本生成方法、装置及存储介质
CN112966524B (zh) * 2021-03-26 2024-01-26 湖北工业大学 基于多粒度孪生网络的中文句子语义匹配方法及系统
CN113065358B (zh) * 2021-04-07 2022-05-24 齐鲁工业大学 面向银行咨询服务基于多粒度对齐的文本对语义匹配方法
CN113593709B (zh) * 2021-07-30 2022-09-30 江先汉 一种疾病编码方法、系统、可读存储介质及装置
CN113569014B (zh) * 2021-08-11 2024-03-19 国家电网有限公司 基于多粒度文本语义信息的运维项目管理方法
CN113780006B (zh) * 2021-09-27 2024-04-09 广州金域医学检验中心有限公司 医学语义匹配模型的训练方法、医学知识匹配方法及装置
CN114090747A (zh) * 2021-10-14 2022-02-25 特斯联科技集团有限公司 基于多重语义匹配的自动问答方法、装置、设备及介质
CN114049884B (zh) * 2022-01-11 2022-05-13 广州小鹏汽车科技有限公司 语音交互方法、车辆、计算机可读存储介质
CN115422362B (zh) * 2022-10-09 2023-10-31 郑州数智技术研究院有限公司 一种基于人工智能的文本匹配方法
CN115688796B (zh) * 2022-10-21 2023-12-05 北京百度网讯科技有限公司 用于自然语言处理领域中预训练模型的训练方法及其装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
CN108984532A (zh) * 2018-07-27 2018-12-11 福州大学 基于层次嵌入的方面抽取方法
CN109299262A (zh) * 2018-10-09 2019-02-01 中山大学 一种融合多粒度信息的文本蕴含关系识别方法
CN110083692A (zh) * 2019-04-22 2019-08-02 齐鲁工业大学 一种金融知识问答的文本交互匹配方法及装置
CN110321419A (zh) * 2019-06-28 2019-10-11 神思电子技术股份有限公司 一种融合深度表示与交互模型的问答匹配方法
CN110502627A (zh) * 2019-08-28 2019-11-26 上海海事大学 一种基于多层Transformer聚合编码器的答案生成方法
CN111310438A (zh) * 2020-02-20 2020-06-19 齐鲁工业大学 基于多粒度融合模型的中文句子语义智能匹配方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408153B (zh) * 2014-12-03 2018-07-31 中国科学院自动化研究所 一种基于多粒度主题模型的短文本哈希学习方法
CN107315772B (zh) * 2017-05-24 2019-08-16 北京邮电大学 基于深度学习的问题匹配方法以及装置
CN108268643A (zh) * 2018-01-22 2018-07-10 北京邮电大学 一种基于多粒度lstm网络的深层语义匹配实体链接方法
CN109408627B (zh) * 2018-11-15 2021-03-02 众安信息技术服务有限公司 一种融合卷积神经网络和循环神经网络的问答方法及系统
CN110032639B (zh) * 2018-12-27 2023-10-31 中国银联股份有限公司 将语义文本数据与标签匹配的方法、装置及存储介质
CN110032635B (zh) * 2019-04-22 2023-01-20 齐鲁工业大学 一种基于深度特征融合神经网络的问题对匹配方法和装置
CN110334184A (zh) * 2019-07-04 2019-10-15 河海大学常州校区 基于机器阅读理解的智能问答系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
CN108984532A (zh) * 2018-07-27 2018-12-11 福州大学 基于层次嵌入的方面抽取方法
CN109299262A (zh) * 2018-10-09 2019-02-01 中山大学 一种融合多粒度信息的文本蕴含关系识别方法
CN110083692A (zh) * 2019-04-22 2019-08-02 齐鲁工业大学 一种金融知识问答的文本交互匹配方法及装置
CN110321419A (zh) * 2019-06-28 2019-10-11 神思电子技术股份有限公司 一种融合深度表示与交互模型的问答匹配方法
CN110502627A (zh) * 2019-08-28 2019-11-26 上海海事大学 一种基于多层Transformer聚合编码器的答案生成方法
CN111310438A (zh) * 2020-02-20 2020-06-19 齐鲁工业大学 基于多粒度融合模型的中文句子语义智能匹配方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NAIXIN ZHANG ET AL.: "MIFM: Multi-Granularity Information Fusion Model for Chinese Named Entity Recognition", IEEE ACCESS, vol. 2019, no. 7, 13 December 2019 (2019-12-13), ISSN: 2169-3536 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705197A (zh) * 2021-08-30 2021-11-26 北京工业大学 一种基于位置增强的细粒度情感分析方法
CN113705197B (zh) * 2021-08-30 2024-04-02 北京工业大学 一种基于位置增强的细粒度情感分析方法
CN114153839A (zh) * 2021-10-29 2022-03-08 杭州未名信科科技有限公司 多源异构数据的集成方法、装置、设备及存储介质
CN114281987A (zh) * 2021-11-26 2022-04-05 重庆邮电大学 一种用于智能语音助手的对话短文本语句匹配方法
CN114218380A (zh) * 2021-12-03 2022-03-22 淮阴工学院 基于多模态的冷链配载用户画像标签抽取方法及装置
CN114218380B (zh) * 2021-12-03 2022-07-29 淮阴工学院 基于多模态的冷链配载用户画像标签抽取方法及装置
CN114238563A (zh) * 2021-12-08 2022-03-25 齐鲁工业大学 基于多角度交互的中文句子对语义智能匹配方法和装置
CN114357158A (zh) * 2021-12-09 2022-04-15 南京中孚信息技术有限公司 基于句粒度语义和相对位置编码的长文本分类技术
CN114357158B (zh) * 2021-12-09 2024-04-09 南京中孚信息技术有限公司 基于句粒度语义和相对位置编码的长文本分类技术
CN114239566B (zh) * 2021-12-14 2024-04-23 公安部第三研究所 基于信息增强实现两步中文事件精准检测的方法、装置、处理器及其计算机可读存储介质
CN114239566A (zh) * 2021-12-14 2022-03-25 公安部第三研究所 基于信息增强实现两步中文事件精准检测的方法、装置、处理器及其计算机可读存储介质
CN114492451B (zh) * 2021-12-22 2023-10-24 马上消费金融股份有限公司 文本匹配方法、装置、电子设备及计算机可读存储介质
CN114492451A (zh) * 2021-12-22 2022-05-13 马上消费金融股份有限公司 文本匹配方法、装置、电子设备及计算机可读存储介质
CN114297390A (zh) * 2021-12-30 2022-04-08 江南大学 一种长尾分布场景下的方面类别识别方法及系统
CN114297390B (zh) * 2021-12-30 2024-04-02 江南大学 一种长尾分布场景下的方面类别识别方法及系统
CN114595306A (zh) * 2022-01-26 2022-06-07 西北大学 基于距离感知自注意力机制和多角度建模的文本相似度计算系统及方法
CN114595306B (zh) * 2022-01-26 2024-04-12 西北大学 基于距离感知自注意力机制和多角度建模的文本相似度计算系统及方法
CN114416930A (zh) * 2022-02-09 2022-04-29 上海携旅信息技术有限公司 搜索场景下的文本匹配方法、系统、设备及存储介质
CN114461806A (zh) * 2022-02-28 2022-05-10 同盾科技有限公司 广告识别模型的训练方法及装置、广告屏蔽方法
CN114357121A (zh) * 2022-03-10 2022-04-15 四川大学 一种基于数据驱动的创新方案设计方法和系统
CN114742016A (zh) * 2022-04-01 2022-07-12 山西大学 一种基于多粒度实体异构图的篇章级事件抽取方法及装置
CN114547256A (zh) * 2022-04-01 2022-05-27 齐鲁工业大学 面向消防安全知识智能问答的文本语义匹配方法和装置
CN114547256B (zh) * 2022-04-01 2024-03-15 齐鲁工业大学 面向消防安全知识智能问答的文本语义匹配方法和装置
CN115048944A (zh) * 2022-08-16 2022-09-13 之江实验室 一种基于主题增强的开放域对话回复方法及系统
CN115600945A (zh) * 2022-09-07 2023-01-13 淮阴工学院(Cn) 基于多粒度的冷链配载用户画像构建方法及装置
CN115238684B (zh) * 2022-09-19 2023-03-03 北京探境科技有限公司 一种文本采集方法、装置、计算机设备及可读存储介质
CN115238684A (zh) * 2022-09-19 2022-10-25 北京探境科技有限公司 一种文本采集方法、装置、计算机设备及可读存储介质
CN115936014A (zh) * 2022-11-08 2023-04-07 上海栈略数据技术有限公司 一种医学实体对码方法、系统、计算机设备、存储介质
CN115438674A (zh) * 2022-11-08 2022-12-06 腾讯科技(深圳)有限公司 实体数据处理、实体链接方法、装置和计算机设备
CN115936014B (zh) * 2022-11-08 2023-07-25 上海栈略数据技术有限公司 一种医学实体对码方法、系统、计算机设备、存储介质
CN115438674B (zh) * 2022-11-08 2023-03-24 腾讯科技(深圳)有限公司 实体数据处理、实体链接方法、装置和计算机设备
CN116306558B (zh) * 2022-11-23 2023-11-10 北京语言大学 一种计算机辅助中文文本改编的方法及装置
CN116306558A (zh) * 2022-11-23 2023-06-23 北京语言大学 一种计算机辅助中文文本改编的方法及装置
CN115910345A (zh) * 2022-12-22 2023-04-04 广东数业智能科技有限公司 一种心理健康测评智能预警方法及存储介质
CN116071759A (zh) * 2023-03-06 2023-05-05 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 一种融合gpt2预训练大模型的光学字符识别方法
CN116204642A (zh) * 2023-03-06 2023-06-02 上海阅文信息技术有限公司 数字阅读中角色隐式属性智能识别分析方法、系统和应用
CN116204642B (zh) * 2023-03-06 2023-10-27 上海阅文信息技术有限公司 数字阅读中角色隐式属性智能识别分析方法、系统和应用
CN116071759B (zh) * 2023-03-06 2023-07-18 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 一种融合gpt2预训练大模型的光学字符识别方法
CN116304745B (zh) * 2023-03-27 2024-04-12 济南大学 基于深层次语义信息的文本话题匹配方法及系统
CN116304745A (zh) * 2023-03-27 2023-06-23 济南大学 基于深层次语义信息的文本话题匹配方法及系统
CN117271438A (zh) * 2023-07-17 2023-12-22 乾元云硕科技(深圳)有限公司 用于大数据的智能存储系统及其方法
CN116629275B (zh) * 2023-07-21 2023-09-22 北京无极慧通科技有限公司 一种基于大数据的智能决策支持系统及方法
CN116629275A (zh) * 2023-07-21 2023-08-22 北京无极慧通科技有限公司 一种基于大数据的智能决策支持系统及方法
CN116680590A (zh) * 2023-07-28 2023-09-01 中国人民解放军国防科技大学 基于工作说明书解析的岗位画像标签提取方法及装置
CN116680590B (zh) * 2023-07-28 2023-10-20 中国人民解放军国防科技大学 基于工作说明书解析的岗位画像标签提取方法及装置
CN116822495B (zh) * 2023-08-31 2023-11-03 小语智能信息科技(云南)有限公司 基于对比学习的汉-老、泰平行句对抽取方法及装置
CN116822495A (zh) * 2023-08-31 2023-09-29 小语智能信息科技(云南)有限公司 基于对比学习的汉-老、泰平行句对抽取方法及装置
CN117590944A (zh) * 2023-11-28 2024-02-23 上海源庐加佳信息科技有限公司 实体人对象和数字虚拟人对象的绑定系统
CN117390141A (zh) * 2023-12-11 2024-01-12 江西农业大学 一种农业社会化服务质量用户评价数据分析方法
CN117390141B (zh) * 2023-12-11 2024-03-08 江西农业大学 一种农业社会化服务质量用户评价数据分析方法
CN117556027A (zh) * 2024-01-12 2024-02-13 一站发展(北京)云计算科技有限公司 基于数字人技术的智能交互系统及方法
CN117556027B (zh) * 2024-01-12 2024-03-26 一站发展(北京)云计算科技有限公司 基于数字人技术的智能交互系统及方法
CN117633518A (zh) * 2024-01-25 2024-03-01 北京大学 一种产业链构建方法及系统
CN117633518B (zh) * 2024-01-25 2024-04-26 北京大学 一种产业链构建方法及系统
CN117669593B (zh) * 2024-01-31 2024-04-26 山东省计算中心(国家超级计算济南中心) 基于等价语义的零样本关系抽取方法、系统、设备及介质
CN117669593A (zh) * 2024-01-31 2024-03-08 山东省计算中心(国家超级计算济南中心) 基于等价语义的零样本关系抽取方法、系统、设备及介质
CN117744787A (zh) * 2024-02-20 2024-03-22 中国电子科技集团公司第十研究所 一阶研判规则知识合理性的智能度量方法
CN117744787B (zh) * 2024-02-20 2024-05-07 中国电子科技集团公司第十研究所 一阶研判规则知识合理性的智能度量方法
CN117874209A (zh) * 2024-03-12 2024-04-12 深圳市诚立业科技发展有限公司 基于nlp的诈骗短信监控告警系统
CN117874209B (zh) * 2024-03-12 2024-05-17 深圳市诚立业科技发展有限公司 基于nlp的诈骗短信监控告警系统
CN117910460A (zh) * 2024-03-18 2024-04-19 国网江苏省电力有限公司南通供电分公司 一种基于bge模型的电力科研知识关联性构建方法及系统
CN117910460B (zh) * 2024-03-18 2024-06-07 国网江苏省电力有限公司南通供电分公司 一种基于bge模型的电力科研知识关联性构建方法及系统
CN118093791A (zh) * 2024-04-24 2024-05-28 北京中关村科金技术有限公司 结合云计算的ai知识库生成方法及系统
CN118132683A (zh) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 文本抽取模型的训练方法、文本抽取方法和设备

Also Published As

Publication number Publication date
CN111310438A (zh) 2020-06-19
CN111310438B (zh) 2021-06-08

Similar Documents

Publication Publication Date Title
WO2021164199A1 (fr) Procédé de mise en correspondance de phrases en chinois sémantique intelligente basée sur un modèle de fusion à granularité multiple, et dispositif
WO2021164200A1 (fr) Procédé et appareil d'appariement sémantique intelligent basés sur un codage hiérarchique profond
CN111259127B (zh) 一种基于迁移学习句向量的长文本答案选择方法
CN111310439B (zh) 一种基于深度特征变维机制的智能语义匹配方法和装置
CN110019732B (zh) 一种智能问答方法以及相关装置
CN112347268A (zh) 一种文本增强的知识图谱联合表示学习方法及装置
WO2021204014A1 (fr) Procédé d'entraînement de modèles et appareil associé
CN111159485B (zh) 尾实体链接方法、装置、服务器及存储介质
CN110032635A (zh) 一种基于深度特征融合神经网络的问题对匹配方法和装置
CN111597314A (zh) 推理问答方法、装置以及设备
TW201841121A (zh) 一種自動生成語義相近句子樣本的方法
CN113377897B (zh) 基于深度对抗学习的多语言医疗术语规范标准化系统及方法
CN113392209B (zh) 一种基于人工智能的文本聚类方法、相关设备及存储介质
WO2024131111A1 (fr) Procédé et appareil d'écriture intelligente, dispositif et support de stockage lisible non volatil
CN111222330B (zh) 一种中文事件的检测方法和系统
CN111241303A (zh) 一种大规模非结构化文本数据的远程监督关系抽取方法
CN113204611A (zh) 建立阅读理解模型的方法、阅读理解方法及对应装置
CN112581327B (zh) 基于知识图谱的法律推荐方法、装置和电子设备
CN112417170B (zh) 面向不完备知识图谱的关系链接方法
CN110826341A (zh) 一种基于seq2seq模型的语义相似度计算方法
WO2023130688A1 (fr) Procédé et appareil de traitement de langage naturel, dispositif et support de stockage lisible
CN116414988A (zh) 基于依赖关系增强的图卷积方面级情感分类方法及系统
CN113051886A (zh) 一种试题查重方法、装置、存储介质及设备
CN116205217B (zh) 一种小样本关系抽取方法、系统、电子设备及存储介质
Yang Intelligent English Translation Evaluation System Based on Internet Automation Technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20920016

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20920016

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20920016

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20920016

Country of ref document: EP

Kind code of ref document: A1