CN109033413B - Neural network-based demand document and service document matching method - Google Patents

Neural network-based demand document and service document matching method Download PDF

Info

Publication number
CN109033413B
CN109033413B CN201810883232.4A CN201810883232A CN109033413B CN 109033413 B CN109033413 B CN 109033413B CN 201810883232 A CN201810883232 A CN 201810883232A CN 109033413 B CN109033413 B CN 109033413B
Authority
CN
China
Prior art keywords
document
similarity
requirement
service
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810883232.4A
Other languages
Chinese (zh)
Other versions
CN109033413A (en
Inventor
邹祥文
吴悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Federation Of Science And Technology Enterprises
University of Shanghai for Science and Technology
Original Assignee
Shanghai Federation Of Science And Technology Enterprises
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Federation Of Science And Technology Enterprises, University of Shanghai for Science and Technology filed Critical Shanghai Federation Of Science And Technology Enterprises
Publication of CN109033413A publication Critical patent/CN109033413A/en
Application granted granted Critical
Publication of CN109033413B publication Critical patent/CN109033413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for matching demand documents and service documents based on a neural network. The method comprises the steps of extracting a document by using a demand document and a service document structure, converting the document into vectors by using paragraph embedding, segmenting an article by using a long-short term memory neural network, calculating the similarity on segmented texts by using a convolutional neural network, and calculating a weighted average value after the similarity of all segmented documents is obtained; and finally, obtaining the similarity of the requirement document and the service document.

Description

Neural network-based demand document and service document matching method
Technical Field
The invention relates to the field of computer natural language processing, mainly aims at matching a demand document and a service document, and particularly relates to a neural network-based demand document and service document matching method.
Background
With the rapid development and popularity of the internet, modern enterprise production approaches become technology-based inter-collaboration. In order to find enterprises which are mutually cooperated, a demand party compiles demand documents meeting the requirements of the enterprises, a technical party compiles service documents corresponding to the technical capacity of the enterprises, and the discovery of the cooperation enterprises is accelerated and the time and labor cost of the enterprises are reduced by connecting the internet.
The enterprise requirement document comprises the problems to be solved by the enterprise and indexes to be achieved when the problems are solved, and the enterprise service document comprises a method for summarizing the technology for solving the problems, experience for solving similar projects, technical reserves for accepting the projects, obtained related patents, a research method to be adopted, mainly realized technical indexes and project progress plans. How to quickly find partners for enterprises through demand documents and service documents becomes the next hotspot and difficulty.
The Document matching method commonly used at present converts a text into a Document Vector Space Model (VSM), and calculates the similarity of two documents through a distance function on the basis of a terminal Frequency-Inverse Document Frequency Model (TF-IDF) Model, wherein the smaller the distance is, the more similar the two documents are. The current matching method is not sufficient in this respect because the requirement document may contain several requirements that need to be met by the cooperating enterprise at the same time, the service document may list the technical services that the enterprise can provide to the greatest extent at present, and the service document needs to meet most or all of the requirement documents and is correct for matching.
Disclosure of Invention
In order to overcome the defect of the matching of a demand document and a service document in the existing matching method and improve the matching accuracy of the demand document and the service document, the invention provides a demand document and service document matching method based on a neural network.
In order to achieve the purpose, the invention adopts the following technical scheme:
step 1: inputting a requirement document and a service document as documents to be matched, wherein the requirement document comprises problems to be solved by an enterprise and indexes to be achieved when the problems are solved, and the service document comprises a method for summarizing and solving the difficult problem technology, experience of solving similar projects, technical reserves of the project, obtained related patents, a research method to be adopted, mainly realized technical indexes and a project progress plan;
step 2: judging whether the input document is a demand document or a service document according to the document content;
step 2.1: the method comprises the steps that a problem needing to be solved by an enterprise and an index part needing to be achieved when the problem is solved are required documents, and the problem needing to be solved by the enterprise and the index part needing to be achieved when the problem is solved are extracted;
step 2.2: the method comprises the steps of summarizing a method for solving the difficult technology, experience of solving similar projects, taking over technical reserves of the project, obtained related patents, a research method to be adopted, a main realized technical index and project progress planning part which are service documents, extracting and summarizing the method for solving the difficult technology, experience of solving the similar projects, taking over the technical reserves of the project, obtained related patents, the research method to be adopted, the main realized technical index and the project progress planning part;
step 2.3: calculating the similarity of all the requirement document extraction parts and all the service document extraction parts according to the similarity of the final requirement document and the final service document, and taking the problems to be solved of the requirement document and a method for solving the difficult problem technology by summarizing the service document as an example;
and step 3: the method comprises the steps of carrying out Paragraph Embedding (PE) processing on sentences in a problem part of a requirement document to be solved and a method part of a service document for solving the difficult problem technology to obtain sentence vectors;
and 4, step 4: judging a document segmentation point through a Long Short-Term Memory network (LSTM);
step 4.1: inputting the obtained sentence vector into a trained Long Short-Term Memory network (LSTM), and judging whether the previous sentence is a segmentation point or not according to the output result of the Long Short-Term Memory network;
and 4.2: according to the dividing point, one part is divided into several text sections with different meanings, the problem part of the demand document is a demand, and the solution part of the service document is a method.
And 5: constructing similarity model input according to the type of the processing result;
step 5.1: if the sentence vector is the requirement document, all sentences of a requirement are processed by a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of a method are taken to form another matrix;
step 5.2: if the sentence vector is the service document, all sentences of a method are processed by a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of a requirement are taken to form another matrix;
step 6: calculating similarity by using the two matrixes as input through a trained Convolutional Neural Network (CNNs), calculating the similarity by using the sum of each requirement intersection and each method, and taking the value with the maximum similarity for each requirement as the final value of the requirement;
and 7: carrying out weighted average on the similarity values to obtain final similarity;
step 7.1: after each requirement final value is obtained, a weighted average value is obtained to serve as a final similarity value of the problem needing to be solved of the requirement document;
step 7.2: the steps take the problem to be solved of the demand document and the method for summarizing the service document to solve the difficult problem technology as an example, the demand document comprises the problem to be solved and an index part which needs to be achieved when the problem is solved, the similarity of the index part which needs to be achieved when the problem is solved by the demand document is solved according to the method, and the weighted average of the two parts is worked out to be used as the final similarity of the demand document and the service document;
and 8: and comparing the final similarity with a preset threshold, wherein if the final similarity is larger than the threshold, the two documents are matched, and if the final similarity is smaller than the threshold, the two documents are not matched.
The dividing point in the step 4 means that the meanings of the previous sentence and the next sentence of the document are different, and the previous sentence is a dividing point. The long-short term memory network history information updating formula is as follows:
C t =0(when h t-1 →1)
wherein C is t Historical information of time t of long-term and short-term memory network,h t-1 Is the output of the last state.
When updating the history information, if the output obtained in the previous time is a division point, C is added t Update to 0, and do not process if it is not a division point.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable technical progress: the method comprises the steps of segmenting a demand document and a service document by a text segmentation method to obtain specific demands and services, and finally calculating the matching degree based on the specific demands and services, so that the problem that most or all of the demands are met when the demand document and the service document are matched is solved. The generated index information is independently constructed into a one-dimensional input matrix, and the influence of the index information in the demand document and the service document on the matching result is solved. After the similarity of each segmented document is obtained, cross matching is carried out, the best matching result is obtained, and the influence of different habits of users on the matching result is solved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a diagram of a convolution network of the similarity calculation model according to the present invention.
FIG. 3 is a diagram of convolution operations in the similarity calculation model according to the present invention.
FIG. 4 is a diagram of a similarity layer in the similarity calculation model according to the present invention.
Fig. 5 is a cross-matching diagram of the present invention.
Detailed Description
Example 1
The technical scheme of the invention is clearly and completely described below by combining the attached drawings in the invention.
The invention provides a matching invention of a demand document and a service document, and a specific flow chart is shown as a figure 1, and the specific implementation steps are as follows:
step 1: inputting a requirement document and a service document as documents to be matched, wherein the requirement document comprises problems to be solved by an enterprise and indexes to be achieved when the problems are solved, and the service document comprises a method for summarizing and solving the difficult problem technology, experience of solving similar projects, technical reserves of the project, obtained related patents, a research method to be adopted, mainly realized technical indexes and a project progress plan;
step 2: judging whether the input document is a demand document or a service document according to the document content;
step 2.1: the method comprises the steps that a problem needing to be solved by an enterprise and an index part needing to be achieved when the problem is solved are required documents, and the problem needing to be solved by the enterprise and the index part needing to be achieved when the problem is solved are extracted;
step 2.2: the method comprises the steps of summarizing a method for solving the difficult technology, experience of solving similar projects, taking over technical reserves of the project, obtained related patents, a research method to be adopted, a main realized technical index and project progress planning part which are service documents, extracting and summarizing the method for solving the difficult technology, experience of solving the similar projects, taking over the technical reserves of the project, obtained related patents, the research method to be adopted, the main realized technical index and the project progress planning part;
step 2.3: calculating the similarity of all the extraction parts of the requirement documents and all the extraction parts of the service documents according to the similarity of the final requirement documents and the service documents, and taking a method for solving the problem of the requirement documents needing to be solved and summarizing the service documents as an example;
and step 3: the method comprises the steps of carrying out Paragraph Embedding (PE) processing on sentences in a problem part of a requirement document to be solved and a method part of a service document for solving the difficult problem technology to obtain sentence vectors;
in the Word Embedding (WE) model, each Word can be mapped to a unique column in the document matrix W, the index of the column is the position of the Word in the vocabulary, and then the Word vectors are concatenated to predict the next Word in the sentence. Given a word sequence w 1 ,w 2 ,w 3 ,…,w T The objective of the word embedding model is to maximize the mean log probability, which is calculated as shown in equation (I):
Figure BDA0001754973380000041
where the probability p is the probability of correctly predicting the next word.
The prediction task is completed by a multi-classifier, such as a softmax classifier, and the calculation formula is shown as formula (II):
Figure BDA0001754973380000042
for each input word i, y i Non-normalized logarithmic probability, the calculation formula is shown in formula (III):
y=b+Uh(w t-k ,…,w t+k ;W) (Ⅲ)
where U and b are parameters of the softmax classifier and h consists of a concatenated or average value of the word vectors extracted from W.
The inspiration of the PE model comes from WE, and paragraph embedding can also be used to predict the next word in a sentence. Each paragraph word is mapped to a unique column in matrix D and each word is mapped to a unique column in matrix W. Compared to the WE model, the PE model varies only in formula (iii), h being composed of a concatenation or an average of word vectors extracted from W to W and D.
And 4, step 4: judging a document segmentation point through a Long Short-Term Memory network (LSTM);
step 4.1: inputting the obtained sentence vector into a trained Long Short-Term Memory network (LSTM), and judging whether the previous sentence is a segmentation point or not according to the output result of the Long Short-Term Memory network;
step 4.2: according to the dividing point, one part is divided into several text sections with different meanings, the problem part of the demand document is a demand, and the solution part of the service document is a method.
The LSTM network contains three gate structures: forgetting Gate (Forget Gate), input Gate (Input Gate), and Output Gate (Output Gate). Each gate functions differently, specifically as follows:
forget the door: the forgetting gate is used for processing the stored history information. The forgetting gate operation uses the current input information and the last time state, then passes through a sigmoid layer, outputs a range [0,1], discards the history information when the output is 0, and retains the history information when the input is 1. Judging whether to discard or not by using the formula (IV):
f t =σ(W f [h t-1 ,x t ]+b f ) (IV)
wherein sigma represents a sigmoid function, x is a vector obtained after processing by a PE model, h represents an output result, whether the output result is a division point or not is judged, w is a long-term and short-term memory network connection parameter, b is an offset value, and f determines information to be forgotten at the moment t.
An input gate: the entry gate decides how to update the history information. The input gate can know whether to update the current input into the history information after operating the input information. A sigmoid layer and a tanh layer are included, the sigmoid layer determines what we will update, and the tanh layer generates new candidate values. The calculation formula is shown as formula (V) and formula (VI):
i t =σ(W f [h t-1 ,x t ]+b i ) (V)
Figure BDA0001754973380000051
wherein i determines the updated value, h represents the output result, and determines whether it is a division point, w is a long-short term memory network connection parameter, b is an offset value, C t The history information of the long-term and short-term memory network at the time t is obtained.
History information is acquired from a forgetting gate, an update candidate key is acquired from an input gate, and the history information is updated by using a formula (VII):
Figure BDA0001754973380000052
wherein C is history information of the long-short term memory network, f is calculated by formula (IV) and determines information to be forgotten at the time t, and i is calculated by formula (V) and determines an updated numerical value.
An output gate: the output gate is used for controlling the current node to output information. The output of the information is determined by a sigmoid layer, and then multiplied by the output of the tanh layer to obtain the output. The calculation formula is shown as formula (VIII) and formula (IX):
o t =σ(W f [h t-1 ,x t ]+b o ) (VIII)
h t =o t *tanh(C t ) (IX)
wherein sigma represents a sigmoid function, x is a vector obtained after processing by a PE model, h represents an output result, whether the result is a division point or not is judged, w is a long-term and short-term memory network connection parameter, and b is an offset value.
After obtaining the LSTM output, the output is between [0,1] through a sigmoid layer, when the output is close to 1, the previous node is represented as a dividing point, otherwise, the previous node is represented as a continuous point.
When history information is updated using the formula (x), ct is updated to 0 if the output obtained at the previous time is a division point, and processing is not performed if the output is not a division point.
C t =0(when h t-1 →1) (X)
In the formulas (IV) to (X), sigma represents a sigmoid function, x represents input, h represents output, whether a division point w represents a connection parameter or not is judged, and b represents an offset value.
And 5: constructing similarity model input according to the type of the processing result;
step 5.1: if the sentence vector is the requirement document, all sentences of a requirement are processed by a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of a method are taken to form another matrix;
step 5.2: if the sentence vector is the service document, all sentences of a method are processed by a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of a requirement are taken to form another matrix;
step 6: calculating similarity by using the two matrixes as input through a trained Convolutional Neural Network (CNNs), calculating the similarity by using the sum of each requirement intersection and each method, and taking the value with the maximum similarity for each requirement as the final value of the requirement;
the CNNs model in the present invention is shown in fig. 2.
CNNs networks are generally divided into an input layer, an output layer, a convolutional layer, and a fully-connected layer.
An input layer: the input layer directly acts on the input matrix, and the invention is a segmented text sentence matrix processed by a PE model.
An output layer: the output after the CNNs processing is the similarity of two sections of texts.
And (3) rolling layers: and performing feature extraction on the input. Consists of a convolution layer and a sampling layer. The convolutional layer has the function of extracting the characteristics of input data, and the characteristics extracted by different convolutional kernels are different. The sampling layer is used for reducing data and simultaneously keeping important information so as to accelerate the processing speed, and the sampling neurons of the same layer share the weight. The sampling layer adopts a sigmoid function as an activation function, so that the sampling layer has displacement invariance.
After the segmented text is obtained, word segmentation processing is carried out on the text, words with high TF-IDF are left, all numbers are left in the text due to the fact that index information is frequently contained in the demand and service, each sentence of the segmented text is processed by using a PE model, obtained sentence vectors are combined into a matrix, and the repeated numbers are used as a single one-dimension.
The matrix formed by the requirement document and the service document is firstly subjected to respective convolution layers, then connected with a similarity layer after convolution processing, and finally output the similarity through a full connection layer.
To capture as many features of the text as possible, two convolution operations are used, as shown in FIG. 3: the window size on the left is 2, the entire word vector. The right window size is also 2 and includes only one dimension of the word vector at a time. In practical experiments, three window sizes of 1,dim/2 and infinity are adopted
When the convolution is processed by a sampling layer, the results obtained by the two types of convolution are respectively subjected to maximum pooling, minimum pooling and mean pooling, different pooling methods can collect different information, and subsequent processing is facilitated.
The similarity used by the similarity layer is the cosine similarity. Since three pooling methods of maximum, minimum and mean are used, they require similarity to each other, and since the result after sampling is a matrix, for each matrix, each row and each column is similar to each other matrix, as shown in fig. 4. For example, suppose the result is an N × M matrix after the maximum pooling. The similarity is obtained between the ith row of the matrix and the N rows of the other matrix, the similarity is obtained between the Mth column of the matrix and the jth column of the other matrix, the finally obtained result is used as a similarity layer, and simultaneously, the similarity is obtained for the whole matrix and the other matrix once.
Full connection layer: the present invention uses a fully-connected layer prior to output, as in a fully-connected layer in a conventional neural network.
And 7: carrying out weighted average on the similarity values to obtain final similarity;
step 7.1: obtaining the final value of each demand, and then calculating a weighted average value as the final similarity value of the problem to be solved of the demand document;
step 7.2: the steps take the problem to be solved of the requirement document and the method for summarizing the service document to solve the difficult problem as an example, the requirement document comprises the problem to be solved and an index part which needs to be reached when the problem is solved, the similarity of the index part which needs to be reached when the problem is solved of the requirement document is solved according to the method, and the weighted average of the two parts is solved to be used as the final similarity of the requirement document and the service document;
the final similarity calculation is performed on the segmentation result of each part of the requirement document and the segmentation result of each part of the service document, as shown in fig. 5, because the requirement document has only two parts, namely, the problem to be solved and the index to be achieved when solving the problem, each part, after text segmentation, will cross with the result of each part of the service document after segmentation to obtain similarity, take the maximum value of the cross result as the matching value of the part, for example, the problem part of the requirement document that needs to be solved is segmented into N segments, the service document summarizes the method for solving the difficult problem technique to partially segment M results, after cross calculation, there are N × M matching results, the value with the maximum similarity is taken for each part of the requirement document as the final value of the part, and after obtaining the final values of all parts of the requirement document, the weighted average value is taken as the final similarity value of the problem of the requirement document that needs to be solved. Similarly, the problem part of the demand document to be solved and all parts of the service document find the best cross result.
The steps are taken as an example of a method for solving the problem of the requirement document and the summary of the service document to solve the difficult problem, the requirement document comprises the problem to be solved and an index part required to be achieved when the problem is solved, the similarity of the index part required to be achieved when the problem is solved by the requirement document is solved according to the method, and the weighted average of the two parts is worked out to serve as the final similarity of the requirement document and the service document.
And 8: and comparing the final similarity with a preset threshold, wherein if the final similarity is larger than the threshold, the two documents are matched, and if the final similarity is smaller than the threshold, the two documents are not matched.
Wherein, the dividing point in the step 4 means that the meanings of the previous sentence and the next sentence of the document are different, and the previous sentence is a dividing point. The historical information updating formula of the long-short term memory network is as follows:
C t =0(when h t-1 →1)
wherein C is t History information of the long-term and short-term memory network at time t, h t-1 It is the output of the previous state, and it is determined whether it is a division point.
When updating the history information, if the output obtained in the previous time is a division point, C is added t Update to 0, and do not process if it is not a division point.

Claims (2)

1. A demand document and service document matching method based on a neural network is characterized by comprising the following operation steps:
step 1: inputting a requirement document and a service document as documents to be matched, wherein the requirement document comprises problems to be solved by an enterprise and indexes to be achieved when the problems are solved, and the service document comprises a method for summarizing and solving the difficult problem technology, experience of solving similar projects, technical reserves of the project, obtained related patents, a research method to be adopted, mainly realized technical indexes and a project progress plan;
step 2: judging whether the input document is a demand document or a service document according to the document content;
step 2.1: the method comprises the steps that problems needing to be solved by an enterprise and index parts needing to be achieved when the problems are solved are required documents, and the problems needing to be solved by the enterprise and the index parts needing to be achieved when the problems are solved are extracted;
step 2.2: the method comprises the steps of summarizing a method for solving the difficult technology, experience of solving similar projects, taking over technical reserves of the project, obtained related patents, a research method to be adopted, a main realized technical index and project progress planning part which are service documents, extracting and summarizing the method for solving the difficult technology, experience of solving the similar projects, taking over the technical reserves of the project, obtained related patents, the research method to be adopted, the main realized technical index and the project progress planning part;
step 2.3: calculating the similarity of all the requirement document extraction parts and all the service document extraction parts according to the similarity of the final requirement document and the final service document, and taking the problems to be solved of the requirement document and a method for solving the difficult problem technology by summarizing the service document as an example;
and step 3: the method comprises the steps of carrying out paragraph embedding processing on sentences in a problem part needing to be solved of a requirement document and a method part for solving the difficult problem of the technology by summarizing a service document to obtain sentence vectors;
and 4, step 4: judging document segmentation points through a long-term and short-term memory network;
step 4.1: inputting the obtained sentence vector into a trained long-short term memory network, and judging whether the previous sentence is a division point or not according to the output result of the long-short term memory network;
when judging whether the previous sentence is a division point, after obtaining the output of the long-term and short-term memory network, the output is between [0,1] through a sigmoid layer, when the output is close to 1, the previous node is a division point, otherwise, the previous node is a continuous point;
step 4.2: according to the dividing point, one part is divided into a plurality of texts with different meanings, the problem part of the demand document is a demand, and the solution part of the service document is a method;
and 5: constructing similarity model input according to the type of the processing result;
step 5.1: if the sentence vector is the requirement document, all sentences of a requirement are processed by a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of a method are taken to form another matrix;
step 5.2: if the sentence vectors are the service documents, all sentences of one method are processed through a PE model to obtain sentence vectors to form a matrix, and all sentence vectors of one requirement are taken to form another matrix;
step 6: calculating similarity by using the two matrixes as input through a trained convolutional neural network, calculating the similarity by using the cross of each requirement and each method, and taking the value with the maximum similarity of each requirement as the final value of the requirement;
two convolution operations are used, three types of 1,dim/2 and infinity are adopted in different window sizes, maximum pooling, minimum pooling and mean pooling are respectively used for results obtained by the two types of convolution when sampling is carried out, and different information is collected by different pooling methods; after sampling, the result is a matrix, for each matrix, each row and each row of the other matrix calculate the similarity, and each column of the other matrix calculate the similarity; similarity is also solved for the whole matrix and the other matrix once, and because the result of solving the similarity for the rows and the columns is more than the result of solving the similarity for the whole matrix, the result of the similarity obtained for the whole matrix is copied, so that the weights of the three are equal, and finally, a full connection layer is connected to output the result of the similarity;
and 7: carrying out weighted average on the similarity values to obtain final similarity;
step 7.1: obtaining the final value of each demand, and then calculating a weighted average value as the final similarity value of the problem to be solved of the demand document;
step 7.2: the steps take the problem to be solved of the requirement document and the method for summarizing the service document to solve the difficult problem as an example, the requirement document comprises the problem to be solved and an index part which needs to be reached when the problem is solved, the similarity of the index part which needs to be reached when the problem is solved of the requirement document is solved according to the method, and the weighted average of the two parts is solved to be used as the final similarity of the requirement document and the service document;
and 8: and comparing the final similarity with a preset threshold, wherein if the final similarity is larger than the threshold, the two documents are matched, and if the final similarity is smaller than the threshold, the two documents are not matched.
2. The neural network-based demand document and service document matching method according to claim 1, wherein:
the division point in the step 4 means that the meanings of the previous sentence and the next sentence of the document are different, and the previous sentence is a division point; the historical information updating formula of the long and short term memory network is as follows:
C t =0(when h t-1 →1)
wherein C is t Duration short-term memory of historical information, h, at time t of the network t-1 If the current state is the output of the previous state, judging whether the current state is a segmentation point;
when updating the history information, if the output obtained in the previous time is a division point, C is added t Update to 0, and do not process if it is not a division point.
CN201810883232.4A 2018-03-12 2018-08-06 Neural network-based demand document and service document matching method Active CN109033413B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018102006246 2018-03-12
CN201810200624 2018-03-12

Publications (2)

Publication Number Publication Date
CN109033413A CN109033413A (en) 2018-12-18
CN109033413B true CN109033413B (en) 2022-12-23

Family

ID=64649584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810883232.4A Active CN109033413B (en) 2018-03-12 2018-08-06 Neural network-based demand document and service document matching method

Country Status (1)

Country Link
CN (1) CN109033413B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595409A (en) * 2018-03-16 2018-09-28 上海大学 A kind of requirement documents based on neural network and service document matches method
CN116097237A (en) * 2020-09-27 2023-05-09 西门子股份公司 Text similarity determination method, device and industrial diagnosis method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502985A (en) * 2016-10-20 2017-03-15 清华大学 A kind of neural network modeling approach and device for generating title
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN107133202A (en) * 2017-06-01 2017-09-05 北京百度网讯科技有限公司 Text method of calibration and device based on artificial intelligence
CN107169035A (en) * 2017-04-19 2017-09-15 华南理工大学 A kind of file classification method for mixing shot and long term memory network and convolutional neural networks
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
CN107679234A (en) * 2017-10-24 2018-02-09 上海携程国际旅行社有限公司 Customer service information providing method, device, electronic equipment, storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528528A (en) * 2016-10-18 2017-03-22 哈尔滨工业大学深圳研究生院 A text emotion analysis method and device
CN106502985A (en) * 2016-10-20 2017-03-15 清华大学 A kind of neural network modeling approach and device for generating title
CN107169035A (en) * 2017-04-19 2017-09-15 华南理工大学 A kind of file classification method for mixing shot and long term memory network and convolutional neural networks
CN107133202A (en) * 2017-06-01 2017-09-05 北京百度网讯科技有限公司 Text method of calibration and device based on artificial intelligence
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
CN107679234A (en) * 2017-10-24 2018-02-09 上海携程国际旅行社有限公司 Customer service information providing method, device, electronic equipment, storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Require- documents and provide-documents matching algorithm based on topic model;Xiangwen Zou;《2016 International Conference on Audio, Language and Image Processing》;20170209;620-625 *
基于长短期记忆循环神经网络的对话文本主题分割;尹庆宇;《哈工大SCIR》;20170425;1-5 *

Also Published As

Publication number Publication date
CN109033413A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN110704588B (en) Multi-round dialogue semantic analysis method and system based on long-short-term memory network
CN108010514B (en) Voice classification method based on deep neural network
CN108319666B (en) Power supply service assessment method based on multi-modal public opinion analysis
CN113688244B (en) Text classification method, system, equipment and storage medium based on neural network
CN109948149B (en) Text classification method and device
CN110110080A (en) Textual classification model training method, device, computer equipment and storage medium
CN110750965B (en) English text sequence labeling method, english text sequence labeling system and computer equipment
CN106709820A (en) Power system load prediction method and device based on deep belief network
CN111984791B (en) Attention mechanism-based long text classification method
CN112131890A (en) Method, device and equipment for constructing intelligent recognition model of conversation intention
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN111881677A (en) Address matching algorithm based on deep learning model
CN110555084A (en) remote supervision relation classification method based on PCNN and multi-layer attention
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN111353534B (en) Graph data category prediction method based on adaptive fractional order gradient
CN113821635A (en) Text abstract generation method and system for financial field
CN116415177A (en) Classifier parameter identification method based on extreme learning machine
CN109033413B (en) Neural network-based demand document and service document matching method
CN113488196A (en) Drug specification text named entity recognition modeling method
CN110569355A (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN114036298B (en) Node classification method based on graph convolution neural network and word vector
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant