CN112988921A - Method and device for identifying map information change - Google Patents

Method and device for identifying map information change Download PDF

Info

Publication number
CN112988921A
CN112988921A CN201911281305.3A CN201911281305A CN112988921A CN 112988921 A CN112988921 A CN 112988921A CN 201911281305 A CN201911281305 A CN 201911281305A CN 112988921 A CN112988921 A CN 112988921A
Authority
CN
China
Prior art keywords
text
result
layer
processed
feature matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911281305.3A
Other languages
Chinese (zh)
Inventor
冯博琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navinfo Co Ltd
Original Assignee
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navinfo Co Ltd filed Critical Navinfo Co Ltd
Priority to CN201911281305.3A priority Critical patent/CN112988921A/en
Publication of CN112988921A publication Critical patent/CN112988921A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for identifying map information change. The method comprises the following steps: acquiring a term characteristic matrix and a document representation characteristic matrix of a text to be processed; acquiring a neural network characteristic matrix of the text to be processed by combining dynamic pooling operation; splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix; and determining whether the text to be processed is a text with changed map information or not according to the target feature matrix. Compared with the prior art, the accuracy of the identification result is improved.

Description

Method and device for identifying map information change
Technical Field
The invention relates to the field of electronic maps, in particular to a method and a device for identifying map information change.
Background
With the development of internet technology, electronic maps are now well known applications. In a real scene, map elements such as roads and POIs are frequently changed, and how to dig out texts showing the changes from massive text information and update an electronic map according to the texts is a main subject facing the present.
In the prior art, target texts are mainly mined by a feature engineering or neural network method. However, in the aspect of feature engineering, an unsupervised algorithm is required to be combined when the document features are extracted, and errors of the unsupervised algorithm inevitably affect the accuracy of feature extraction, so that the mining effect is poor; in the aspect of a neural network, in a traditional maximum, minimum and mean pooling method in a convolutional neural network CNN model, only a unique value is taken in a single kernel, so that the loss of features is easily caused, the feature extraction accuracy is low, and the mining effect is poor. It can be seen that the two methods of the prior art are not highly accurate in target text mining.
Disclosure of Invention
The invention provides a method and a device for identifying map information change, which are used for improving the identification accuracy of the map information change.
In a first aspect, the present invention provides a method for identifying a map information change, including:
acquiring a term characteristic matrix and a document representation characteristic matrix of a text to be processed;
acquiring a neural network characteristic matrix of the text to be processed by combining dynamic pooling operation;
splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix;
and determining whether the text to be processed is a text with changed map information or not according to the target feature matrix.
Optionally, the obtaining the term feature matrix of the text to be processed includes:
performing word segmentation processing, stop word processing, low-frequency lexical item filtering processing and lexical item identification conversion processing on the text to be processed to obtain a preprocessed text;
extracting the character length and the number of terms contained in the preprocessed text from the preprocessed text;
extracting the length of the non-repeated characters contained in the preprocessed text and the number of the contained non-repeated terms from the preprocessed text;
calculating keyword similarity according to each term contained in the preprocessed text and each keyword in a keyword library;
and inputting the character length, the number of terms, the length of the non-repeated characters, the number of the non-repeated terms and the similarity of the keywords into a multi-layer perceptron MLP to obtain the term feature matrix.
Optionally, the obtaining of the document representation feature matrix of the text to be processed includes:
extracting the document representation characteristics of the text to be processed through a pre-trained document theme generation model LDA;
and inputting the document representation features into a multi-layer perceptron MLP to obtain a document representation feature matrix.
Optionally, the calculating keyword similarity according to each term contained in the preprocessed text and each keyword in the keyword library includes:
converting word2vec models according to terms contained in the preprocessed text and pre-trained word vectors to obtain word vectors of the terms contained in the preprocessed text;
obtaining a word vector of each keyword in the keyword library according to each keyword in the keyword library and the word vector conversion word2vec model;
and calculating the similarity of the keywords through a similarity formula according to the word vectors of the terms contained in the preprocessed text and the word vectors of the keywords in the keyword library.
Optionally, the obtaining the neural network feature matrix of the text to be processed in combination with the dynamic pooling operation includes:
generating corresponding static word vectors and dynamic word vectors according to the text to be processed;
splicing the static word vector and the dynamic word vector to obtain a splicing result;
performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result;
adding the characteristic numerical values of the convolution results on adjacent dimensions to obtain a folding result;
weighting the folding result by adopting an attention mechanism to obtain a weighting result;
performing dynamic pooling operation on the weighting result to obtain a pooling result;
and transforming the pooling result through MLP to determine the neural network characteristic matrix.
Optionally, the weighting the folding result by using an attention mechanism to obtain a weighted result includes:
weighting the folding result by adopting the following formula to obtain a weighting result:
f(KT,K)=KTWaK
wherein, f (K)TK) represents the weighting result, K represents the folding result, WaThe representation is the parameter which needs to be learned currently.
Optionally, the determining, according to the target feature matrix, whether the text to be processed is a text with changed map information includes:
inputting the target characteristic matrix into a classification model trained in advance;
and determining whether the text to be processed is a text with changed map information or not according to the output result of the classification model.
Optionally, the method is applied to a first model structure, and the first model structure includes: an embedding layer and a neural network model structure, the neural network model structure comprising: the device comprises a convolution layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer and a batch normalization layer;
the acquiring the neural network feature matrix of the text to be processed in combination with the dynamic pooling operation comprises:
the embedding layer generates corresponding static word vectors and dynamic word vectors according to the text to be processed, splices the static word vectors and the dynamic word vectors to obtain a splicing result, and transmits the splicing result to the convolution layer;
the convolution layer performs convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result, and transmits the convolution result to the folding layer;
the folding layer adds the characteristic numerical values of the convolution results on the adjacent dimensionalities to obtain folding results, and transmits the folding results to the self-attention mechanism layer;
the self-attention mechanism layer weights the folding result by adopting an attention mechanism to obtain a weighted result, and transmits the weighted result to the pooling layer;
the pooling layer performs dynamic pooling operation on the weighting result to obtain a pooling result, and outputs the pooling result to the activation layer;
and the activation layer and the batch normalization layer transform the pooling result through MLP to determine the neural network characteristic matrix.
In a second aspect, the present invention provides an apparatus for recognizing a change in map information, comprising:
the acquisition module is used for acquiring a term feature matrix and a document representation feature matrix of the text to be processed;
the acquisition module is further used for acquiring the neural network characteristic matrix of the text to be processed in combination with dynamic pooling operation;
the splicing module is used for splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix;
and the classification module is used for determining whether the text to be processed is a text with changed map information or not according to the target feature matrix.
Optionally, the obtaining module is specifically configured to:
performing word segmentation processing, stop word processing, low-frequency lexical item filtering processing and lexical item identification conversion processing on the text to be processed to obtain a preprocessed text;
extracting the character length and the number of terms contained in the preprocessed text from the preprocessed text;
extracting the length of the non-repeated characters contained in the preprocessed text and the number of the contained non-repeated terms from the preprocessed text;
calculating keyword similarity according to each term contained in the preprocessed text and each keyword in a keyword library;
and inputting the character length, the number of terms, the length of the non-repeated characters, the number of the non-repeated terms and the similarity of the keywords into a multi-layer perceptron MLP to obtain the term feature matrix.
The acquisition module is specifically configured to:
extracting the document representation characteristics of the text to be processed through a pre-trained document theme generation model LDA;
and inputting the document representation features into a multi-layer perceptron MLP to obtain a document representation feature matrix.
Optionally, the obtaining module is specifically configured to:
converting word2vec models according to terms contained in the preprocessed text and pre-trained word vectors to obtain word vectors of the terms contained in the preprocessed text;
obtaining a word vector of each keyword in the keyword library according to each keyword in the keyword library and the word vector conversion word2vec model;
and calculating the similarity of the keywords through a similarity formula according to the word vectors of the terms contained in the preprocessed text and the word vectors of the keywords in the keyword library.
Optionally, the obtaining module is specifically configured to:
generating corresponding static word vectors and dynamic word vectors according to the text to be processed;
splicing the static word vector and the dynamic word vector to obtain a splicing result;
performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result;
adding the characteristic numerical values of the convolution results on adjacent dimensions to obtain a folding result;
weighting the folding result by adopting an attention mechanism to obtain a weighting result;
performing dynamic pooling operation on the weighting result to obtain a pooling result;
and transforming the pooling result through MLP to determine the neural network characteristic matrix.
Optionally, the obtaining module is specifically configured to:
weighting the folding result by adopting the following formula to obtain a weighting result:
f(KT,K)=KTWaK
wherein, f (K)TK) represents the weighting result, K represents the folding result, WaThe representation is the parameter which needs to be learned currently.
Optionally, the apparatus has a first model structure, the first model structure comprising: an embedding layer and a neural network model structure, the neural network model structure comprising: a convolutional layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer, and a batch normalization layer.
The classification module is specifically configured to:
inputting the target characteristic matrix into a classification model trained in advance;
and determining whether the text to be processed is a text with changed map information or not according to the output result of the classification model.
In a third aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of identifying a change in map information.
In a fourth aspect, the present invention provides an electronic device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to implement the above-described method of identifying a change in map information via execution of the executable instructions.
The invention provides a method and equipment for identifying map information change, which comprises the steps of firstly, acquiring a term feature matrix and a document representation feature matrix of a text to be processed; then, combining with dynamic pooling operation, obtaining a neural network characteristic matrix of the text to be processed; then, splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix; and finally, determining whether the text to be processed is a text with changed map information or not according to the target feature matrix. The accuracy of the extracted features is greatly improved, and the accuracy of identifying the target text based on the extracted features is also improved.
Drawings
Fig. 1 is a schematic flowchart of a first embodiment of a method for identifying a change in map information according to the present invention;
fig. 2 is a flowchart illustrating a second embodiment of a method for identifying a change in map information according to the present invention;
FIG. 2a is a schematic structural diagram of a first model according to the present invention;
FIG. 3 is a schematic structural diagram of an embodiment of an apparatus for identifying a change in map information according to the present invention;
fig. 4 is a schematic diagram of a hardware structure of the electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method comprises the steps of mining texts related to map element changes from massive text information, updating an electronic map according to the texts, and achieving mining of target texts mainly through a feature engineering or neural network method in the prior art. However, in the aspect of feature engineering, an unsupervised algorithm is required to be combined when the document features are extracted, and errors of the unsupervised algorithm inevitably affect the accuracy of feature extraction, so that the mining effect is poor; in the aspect of a neural network, in a traditional maximum, minimum and mean pooling method in a convolutional neural network CNN model, only a unique value is taken in a single kernel, so that the loss of features is easily caused, the feature extraction accuracy is low, and the mining effect is poor. It can be seen that the two methods of the prior art are not highly accurate in target text mining.
Based on the technical problem, the invention provides a method and a device for identifying map information change, on one hand, a pooling method in a neural network is improved into a dynamic pooling method; on the other hand, the neural network and the feature engineering are combined to extract the document features, so that the accuracy of the extracted features is greatly improved, and the accuracy of identifying the target text based on the extracted features is also improved.
The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a first embodiment of a method for identifying a change in map information according to the present invention. As shown in fig. 1, the method for identifying a change in map information provided by this embodiment includes:
s101, acquiring a lexical item feature matrix of the text to be processed.
In one possible implementation, the obtaining a term feature matrix of a text to be processed includes:
and step A, performing word segmentation processing, stop word processing, low-frequency lexical item filtering processing and lexical item identification conversion processing on the text to be processed to obtain a preprocessed text.
The word segmentation processing of the text to be processed refers to: and splitting sentences in the text to be processed into terms, thereby obtaining a term set after word segmentation processing.
The method for processing stop words of the text to be processed includes the following steps: deleting the terms which are irrelevant to the map elements in the term set after the word segmentation processing. Optionally, a stop word bank may be pre-established, terms in the set and terms in the stop word bank are matched, terms successfully matched may be regarded as stop words, and the terms may be deleted, so as to obtain a term set after the stop words are removed.
The low-frequency lexical item filtering processing on the text to be processed refers to the following steps: and deleting the terms with lower frequency in the term set after the stop word processing. Optionally, a frequency threshold may be set first, the frequency of occurrence of each term in the term set after the stop word processing is counted, the frequency of occurrence of each term is compared with the frequency threshold, terms smaller than the frequency threshold may be regarded as low-frequency terms, and then the terms may be deleted, so as to obtain the term set after the low-frequency terms are filtered.
The term identification conversion processing of the text to be processed refers to: and converting the words in the lexical item set after the low-frequency lexical item filtering processing into corresponding lexical item identifications, thereby obtaining the lexical item set after the lexical item identification conversion processing.
Optionally, in order to eliminate the influence of place names on the subsequent processing process, when the text to be processed is subjected to the term identification conversion processing, the terms representing the place names in the term set may be recognized first, the recognized terms are converted into < POS >, and the terms absent in the dictionary appearing in the term set are converted into < UNK >.
Optionally, the process of preprocessing the text to be processed may further include: punctuation mark removing processing, special symbol removing processing and the like.
It should be noted that: the sequence of the several processes included in the above pre-treatment is not limited to the sequence described above, and those skilled in the art can adjust the sequence of the processes according to actual situations, and the scheme including the above processes performed in any sequence is within the scope of the present invention.
And B, extracting the length of the characters contained in the preprocessed text and the number of the contained terms from the preprocessed text.
And C, extracting the length of the non-repeated characters contained in the preprocessed text and the number of the contained non-repeated terms from the preprocessed text.
For example, the text to be processed contains the sentences "wuhan is building the 5 th tunnel and Nanjing is also building the tunnel", assuming that the segmentation result is "wuhan/now/build/5 th/one/cross/tunnel/,/Nanjing/also/build/cross/tunnel/. "then step B and step C involve statistics of: character length: 25; the number of terms: 16; length of non-repeated character: 18; number of non-repeated terms: 13.
it should be noted that: the above examples are examples for illustrating the concept of statistics involved in step B and step C, and the preprocessing steps such as removing stop words and punctuation marks are omitted, and the solution obtained by combining the preprocessing steps, step B and step C by those skilled in the art is still within the protection scope of the present invention.
And D, calculating keyword similarity according to each term contained in the preprocessed text and each keyword in a keyword library.
In an implementation manner, calculating the similarity of keywords can be implemented as follows:
firstly, word vectors of terms contained in the preprocessed text are obtained according to terms contained in the preprocessed text and a word vector conversion word2vec model trained in advance.
And then, obtaining a word vector of each keyword in the keyword library according to each keyword in the keyword library and the word vector conversion word2vec model.
The word2vec model is a language model and is mainly used for training word vectors, namely, words are expressed in a vector mode, and word quantification is realized. The word2vec model is simple and fast in training, and the similarity calculation manual evaluation method has a good effect.
And finally, calculating the similarity of the keywords through a similarity formula according to the word vectors of the terms contained in the preprocessed text and the word vectors of the keywords in the keyword library.
It is assumed that the keyword library contains keywords that may be, for example, the words in table 1:
TABLE 1
Figure BDA0002316828030000091
Optionally, the keyword similarity may be calculated by using the following formula:
Figure BDA0002316828030000092
Figure BDA0002316828030000093
wherein sim (doc)iKeywords) represents keyword similarity, tkRepresenting terms, k, in preprocessed textjRepresenting keywords in a keyword library. n represents a word vector obtained by word2vec trainingDimension, m is the mth dimension in the word vector. From the above formula, the keyword similarity is the maximum value of cosine similarity between the terms in the preprocessed text and the keywords in the keyword library.
And E, inputting the length of the characters, the number of the terms, the length of the non-repeated characters, the number of the non-repeated terms and the similarity of the keywords contained in the preprocessed text into a multi-layer perceptron MLP to obtain the term feature matrix.
Specifically, after inputting several features of step E into the multi-layered perceptron MLP, the multi-layered perceptron MLP may map the features into a shape of (batch _ size, hidden)t) Where batch _ size is the batch size. hiddentThe number of hidden nodes in the hidden layer of the MLP is shown.
And S102, acquiring a document representation feature matrix of the text to be processed.
In an implementation manner, obtaining the document representation feature matrix can be implemented by:
firstly, extracting the document representation characteristics of the preprocessed text through a pre-trained document theme generation model LDA.
The LDA model is an unsupervised document theme model, and documents are quantitatively represented by document themes. The LDA model is used for enriching vectorization representation of the document, and better effect is obtained in final document type judgment.
Then, the document representation features are input into a multi-layer perceptron MLP to obtain a document representation feature matrix.
Specifically, after the document representation features are input into the multi-layered perceptron MLP, the multi-layered perceptron MLP may map the document representation features into a shape of (batch _ size, hidden)r) Where batch _ size is the batch size. hiddenrThe number of hidden nodes in the hidden layer of the MLP is shown.
And S103, acquiring the neural network characteristic matrix of the text to be processed by combining with dynamic pooling operation.
And S104, splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix.
Specifically, after a term feature matrix, a document representation feature matrix and a neural network feature matrix of a text to be processed are obtained, the feature matrices are spliced, a vector obtained by splicing is a target feature matrix, and the dimensionality of the target feature matrix is (batch _ size, hidden)n+hiddent+hiddenr)。
Among them, hiddennAnd representing the number of hidden nodes in the MLP hidden layer corresponding to the neural network feature matrix.
S105, determining whether the text to be processed is a text with changed map information or not according to the target feature matrix.
Specifically, the target feature vector may be input into the classifier, and the distributor may segment the category of the text to be processed according to the target feature vector, and further determine whether the text to be processed is a text related to the change of the map element according to the category.
In the method for identifying the change of the map information, when the feature of the text to be processed is extracted, on one hand, a term feature matrix and a document representation feature matrix of the text to be processed are obtained by a feature engineering method, on the other hand, a neural network feature matrix of the text to be processed is obtained by combining dynamic pooling operation, then the term feature matrix, the document representation feature matrix and the neural network feature matrix are spliced to obtain a target feature matrix, and finally, whether the text to be processed is the text with the change of the map information is determined according to the target feature matrix, so that the accuracy of the extracted feature is greatly improved, and meanwhile, the accuracy of identifying the target text based on the extracted feature is also improved.
An implementation manner of obtaining the neural network feature matrix of the text to be processed in the foregoing embodiment S103 by combining with the dynamic pooling operation is described below with reference to a specific embodiment. Fig. 2 is a schematic flowchart of a second embodiment of the method for identifying a change in map information according to the present invention, and as shown in fig. 2, the method for identifying a change in map information according to the present embodiment includes:
s201, acquiring a term feature matrix and a document representation feature matrix of the text to be processed.
The implementation manner of S201 may refer to S101 in the above embodiment, and the present invention is not described herein again.
S202, generating corresponding static word vectors and dynamic word vectors according to the text to be processed.
Specifically, the implementation manner of generating the static word vector corresponding to the text to be processed is as follows: inputting each term in the preprocessed text after the preprocessing in the embodiment into the word2vec model to obtain a word vector corresponding to each term, and combining the word vectors corresponding to all terms to obtain a static word vector corresponding to the text to be processed.
The realization mode for generating the dynamic word vector corresponding to the text to be processed is as follows: and combining the word vectors corresponding to all the terms to form a vector serving as an initial value, and dynamically adjusting the initial value in a manner provided in the prior art to obtain a dynamic word vector corresponding to the text to be processed.
S203, splicing the static word vectors and the dynamic word vectors to obtain a splicing result.
The method for splicing vectors can be referred to in the prior art, and the present invention is not described herein again.
And S204, performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result.
The length of the dynamic word vector is not fixed, so that the length of the splicing result is also not fixed, and the splicing result with the length not fixed is processed by adopting one-dimensional wide convolution, so that the processing efficiency is improved while the text information is prevented from being lost.
The principle of convolution operation is similar to that of the prior art, and the present invention is not described in detail herein.
And S205, adding the characteristic numerical values of the convolution results on the adjacent dimensions to obtain a folding result.
And S206, weighting the folding result by adopting an attention mechanism to obtain a weighting result.
Specifically, a self-attention mechanism is adopted here, and the formula is as follows:
f(KT,K)=KTWaK
wherein K represents the folding result obtained in S205. f (K)TK) represents the weighting result, WaThe representation is the parameter which needs to be learned currently.
And S207, performing dynamic pooling operation on the weighting result to obtain a pooling result.
Dynamic Pooling, K-max Pooling, has a Pooling window size of (S,1), where S is the number of rows of the above-mentioned weighted result. K-max Pooling dynamically takes Top-K maxima and preserves the relative order between them in a single Pooling window, as compared to maximum Pooling taking only one maximum in a Pooling window. Except that the last K in the K-max Pooling in the model is the manually assigned hyper-parameter, the rest K is obtained by dynamic calculation according to the number of rows (namely the number of sequences) of the input characteristic diagram, and the calculation mode is shown in the following formula. Wherein k istopAnd taking the value of K in the last K-max Pooling, wherein N represents the number of K-max Pooling operations in the whole model, and L is the sequence of the current Pooling operation in N.
Figure BDA0002316828030000121
Compared with the general pooling operation, the dynamic pooling operation of the present embodiment reduces loss of valid features and retains location information while reducing dimensions.
Optionally, the process of S204-S207 may be repeatedly performed twice, the pooling result after the second performance may be used as a final pooling result, and the two dynamic pooling operations correspond to two dynamic pooling layers in the neural network.
And S208, transforming the pooling result through MLP to determine the neural network characteristic matrix.
In an implementation manner, the steps of S202-S208 described above can be performed by a first model structure shown in fig. 2a, where the first model structure shown in fig. 2a includes: an embedding layer and a neural network model structure, the neural network model structure further comprising: a convolutional layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer, and a batch normalization layer.
Specifically, the embedding layer firstly generates a corresponding static word vector and a corresponding dynamic word vector according to the text to be processed, splices the static word vector and the dynamic word vector to obtain a splicing result, and further transmits the splicing result to the convolution layer; the convolution layer performs convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result, and transmits the convolution result to the folding layer; the folding layer adds the characteristic values of the convolution results on the adjacent dimensionalities to obtain folding results, and transmits the folding results to the self-attention mechanism layer; the self-attention mechanism layer adopts an attention mechanism to weight the folding result to obtain a weighting result, and transmits the weighting result to the pooling layer; the pooling layer executes dynamic pooling operation on the weighting result to obtain a pooling result, and the pooling result is input to the activation layer; and the activation layer and the batch normalization layer transform the pooling result through MLP, so that the neural network characteristic matrix is determined.
S209, splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix.
And S2010, determining whether the text to be processed is a text with changed map information or not according to the target feature matrix.
Specifically, the implementation manners of S208-S2010 may refer to the above embodiments, and the present invention is not described herein again.
The method for identifying the map information change, provided by the embodiment, describes an implementation manner of obtaining the neural network feature matrix of the text to be processed by using a neural network method, and combines the feature engineering method and the neural network method to mutually promote, so that the defects of a single method are overcome, and the identification result is more accurate.
Fig. 3 is a schematic structural diagram of an embodiment of an apparatus for identifying a change in map information according to the present invention. As shown in fig. 3, the apparatus for identifying a change in map information provided in this embodiment includes:
an obtaining module 301, configured to obtain a term feature matrix and a document representation feature matrix of a text to be processed;
the obtaining module 301 is further configured to obtain a neural network feature matrix of the text to be processed in combination with dynamic pooling operation;
a splicing module 302, configured to splice the term feature matrix, the document representation feature matrix, and the neural network feature matrix to obtain a target feature matrix;
and the classification module 303 is configured to determine whether the text to be processed is a text with changed map information according to the target feature matrix.
Optionally, the obtaining module 301 is specifically configured to:
performing word segmentation processing, stop word processing, low-frequency lexical item filtering processing and lexical item identification conversion processing on the text to be processed to obtain a preprocessed text;
extracting the character length and the number of terms contained in the preprocessed text from the preprocessed text;
extracting the length of the non-repeated characters contained in the preprocessed text and the number of the contained non-repeated terms from the preprocessed text;
calculating keyword similarity according to each term contained in the preprocessed text and each keyword in a keyword library;
and inputting the character length, the number of terms, the length of the non-repeated characters, the number of the non-repeated terms and the similarity of the keywords into a multi-layer perceptron MLP to obtain the term feature matrix.
The obtaining module 301 is specifically configured to:
extracting the document representation characteristics of the text to be processed through a pre-trained document theme generation model LDA;
and inputting the document representation features into a multi-layer perceptron MLP to obtain a document representation feature matrix.
Optionally, the obtaining module 301 is specifically configured to:
converting word2vec models according to terms contained in the preprocessed text and pre-trained word vectors to obtain word vectors of the terms contained in the preprocessed text;
obtaining a word vector of each keyword in the keyword library according to each keyword in the keyword library and the word vector conversion word2vec model;
and calculating the similarity of the keywords through a similarity formula according to the word vectors of the terms contained in the preprocessed text and the word vectors of the keywords in the keyword library.
Optionally, the obtaining module 301 is specifically configured to:
generating corresponding static word vectors and dynamic word vectors according to the text to be processed;
splicing the static word vector and the dynamic word vector to obtain a splicing result;
performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result;
adding the characteristic numerical values of the convolution results on adjacent dimensions to obtain a folding result;
weighting the folding result by adopting an attention mechanism to obtain a weighting result;
performing dynamic pooling operation on the weighting result to obtain a pooling result;
and transforming the pooling result through MLP to determine the neural network characteristic matrix.
Optionally, the obtaining module 301 is specifically configured to:
weighting the folding result by adopting the following formula to obtain a weighting result:
f(KT,K)=KTWaK
wherein, f (K)TK) represents the weighting result, K represents the folding result, WaThe representation is the parameter which needs to be learned currently.
Optionally, the apparatus of this embodiment has a first model structure, as shown in fig. 3, where the first model structure includes: an embedding layer and a neural network model structure, the neural network model structure comprising: a convolutional layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer, and a batch normalization layer.
The embedding layer generates corresponding static word vectors and dynamic word vectors according to the text to be processed, splices the static word vectors and the dynamic word vectors to obtain a splicing result, and transmits the splicing result to the convolution layer;
the convolution layer performs convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result, and transmits the convolution result to the folding layer;
the folding layer adds the characteristic numerical values of the convolution results on the adjacent dimensionalities to obtain folding results, and transmits the folding results to the self-attention mechanism layer;
the self-attention mechanism layer weights the folding result by adopting an attention mechanism to obtain a weighted result, and transmits the weighted result to the pooling layer;
the pooling layer performs dynamic pooling operation on the weighting result to obtain a pooling result, and outputs the pooling result to the activation layer;
and the activation layer and the batch normalization layer transform the pooling result through MLP to determine the neural network characteristic matrix.
Optionally, the obtaining module 301 may include the first model structure.
The classification module 303 is specifically configured to:
inputting the target characteristic matrix into a classification model trained in advance;
and determining whether the text to be processed is a text with changed map information or not according to the output result of the classification model.
The apparatus for identifying a change in map information provided in this embodiment may be used to implement the method for identifying a change in map information described in any of the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 4 is a schematic diagram of a hardware structure of the electronic device provided by the present invention. As shown in fig. 4, the electronic device of the present embodiment may include:
a memory 401 for storing program instructions.
The processor 402 is configured to implement the method for identifying a map information change described in any of the above embodiments when the program instructions are executed, and specific implementation principles may refer to the above embodiments, which are not described herein again.
The present invention provides a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the method of identifying a change in map information described in any of the above embodiments.
The present invention also provides a program product including a computer program stored in a readable storage medium, the computer program being readable from the readable storage medium by at least one processor, the computer program being executable by the at least one processor to cause an electronic device to implement the method for identifying a change in map information described in any of the above embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for recognizing map information change is characterized by comprising the following steps:
acquiring a term characteristic matrix and a document representation characteristic matrix of a text to be processed;
acquiring a neural network characteristic matrix of the text to be processed by combining dynamic pooling operation;
splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix;
and determining whether the text to be processed is a text with changed map information or not according to the target feature matrix.
2. The method of claim 1, wherein the obtaining a term feature matrix of the text to be processed comprises:
converting word2vec models according to terms contained in the text to be processed and pre-trained word vectors to obtain word vectors of the terms contained in the text to be processed;
obtaining a word vector of each keyword in a keyword library according to each keyword in the keyword library and the word vector conversion word2vec model;
calculating keyword similarity through a similarity formula according to the word vectors of all terms contained in the text to be processed and the word vectors of all keywords in the keyword library;
and inputting the keyword similarity into a multi-layer perceptron MLP to obtain the term feature matrix.
3. The method according to claim 1 or 2, wherein the obtaining the neural network feature matrix of the text to be processed in combination with the dynamic pooling operation comprises:
generating corresponding static word vectors and dynamic word vectors according to the text to be processed;
splicing the static word vector and the dynamic word vector to obtain a splicing result;
performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result;
adding the characteristic numerical values of the convolution results on adjacent dimensions to obtain a folding result;
weighting the folding result by adopting an attention mechanism to obtain a weighting result;
performing dynamic pooling operation on the weighting result to obtain a pooling result;
and transforming the pooling result through MLP to determine the neural network characteristic matrix.
4. A method according to claim 1 or 2, characterized in that the method is applied to a first model structure comprising: an embedding layer and a neural network model structure, the neural network model structure comprising: the device comprises a convolution layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer and a batch normalization layer;
the acquiring the neural network feature matrix of the text to be processed in combination with the dynamic pooling operation comprises:
the embedding layer generates corresponding static word vectors and dynamic word vectors according to the text to be processed, splices the static word vectors and the dynamic word vectors to obtain a splicing result, and transmits the splicing result to the convolution layer;
the convolution layer performs convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result, and transmits the convolution result to the folding layer;
the folding layer adds the characteristic numerical values of the convolution results on the adjacent dimensionalities to obtain folding results, and transmits the folding results to the self-attention mechanism layer;
the self-attention mechanism layer weights the folding result by adopting an attention mechanism to obtain a weighted result, and transmits the weighted result to the pooling layer;
the pooling layer performs dynamic pooling operation on the weighting result to obtain a pooling result, and outputs the pooling result to the activation layer;
and the activation layer and the batch normalization layer transform the pooling result through MLP to determine the neural network characteristic matrix.
5. An apparatus for recognizing a change in map information, comprising:
the acquisition module is used for acquiring a term feature matrix and a document representation feature matrix of the text to be processed;
the acquisition module is further used for acquiring the neural network characteristic matrix of the text to be processed in combination with dynamic pooling operation;
the splicing module is used for splicing the term feature matrix, the document representation feature matrix and the neural network feature matrix to obtain a target feature matrix;
and the determining module is used for determining whether the text to be processed is a text with changed map information according to the target feature matrix.
6. The apparatus of claim 5, wherein the obtaining module is specifically configured to:
converting word2vec models according to terms contained in the text to be processed and pre-trained word vectors to obtain word vectors of the terms contained in the text to be processed;
obtaining a word vector of each keyword in a keyword library according to each keyword in the keyword library and the word vector conversion word2vec model;
calculating keyword similarity through a similarity formula according to the word vectors of all terms contained in the text to be processed and the word vectors of all keywords in the keyword library;
and inputting the keyword similarity into a multi-layer perceptron MLP to obtain the term feature matrix.
7. The apparatus according to claim 5 or 6, wherein the obtaining module is specifically configured to:
generating corresponding static word vectors and dynamic word vectors according to the text to be processed;
splicing the static word vector and the dynamic word vector to obtain a splicing result;
performing convolution operation on the splicing result by adopting one-dimensional wide convolution to obtain a convolution result;
adding the characteristic numerical values of the convolution results on adjacent dimensions to obtain a folding result;
weighting the folding result by adopting an attention mechanism to obtain a weighting result;
performing dynamic pooling operation on the weighting result to obtain a pooling result;
and transforming the pooling result through MLP to determine the neural network characteristic matrix.
8. The apparatus of claim 5 or 6, wherein the apparatus has a first model structure comprising: an embedding layer and a neural network model structure, the neural network model structure comprising: a convolutional layer, a folding layer, a self-attention mechanism layer, a pooling layer, an activation layer, and a batch normalization layer.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-4.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to implement the method of any of claims 1-4 via execution of the executable instructions.
CN201911281305.3A 2019-12-13 2019-12-13 Method and device for identifying map information change Pending CN112988921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911281305.3A CN112988921A (en) 2019-12-13 2019-12-13 Method and device for identifying map information change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911281305.3A CN112988921A (en) 2019-12-13 2019-12-13 Method and device for identifying map information change

Publications (1)

Publication Number Publication Date
CN112988921A true CN112988921A (en) 2021-06-18

Family

ID=76332425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911281305.3A Pending CN112988921A (en) 2019-12-13 2019-12-13 Method and device for identifying map information change

Country Status (1)

Country Link
CN (1) CN112988921A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445919A (en) * 2016-09-28 2017-02-22 上海智臻智能网络科技股份有限公司 Sentiment classifying method and device
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN108984523A (en) * 2018-06-29 2018-12-11 重庆邮电大学 A kind of comment on commodity sentiment analysis method based on deep learning model
KR20180137168A (en) * 2017-06-16 2018-12-27 (주)이스트소프트 Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method
WO2019105134A1 (en) * 2017-11-30 2019-06-06 阿里巴巴集团控股有限公司 Word vector processing method, apparatus and device
CN109977413A (en) * 2019-03-29 2019-07-05 南京邮电大学 A kind of sentiment analysis method based on improvement CNN-LDA
CN110046344A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 Add the method and terminal device of separator
CN110059191A (en) * 2019-05-07 2019-07-26 山东师范大学 A kind of text sentiment classification method and device
US10387531B1 (en) * 2015-08-18 2019-08-20 Google Llc Processing structured documents using convolutional neural networks
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
CN110245685A (en) * 2019-05-15 2019-09-17 清华大学 Genome unit point makes a variation pathogenic prediction technique, system and storage medium
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
WO2019210820A1 (en) * 2018-05-03 2019-11-07 华为技术有限公司 Information output method and apparatus
WO2019223379A1 (en) * 2018-05-22 2019-11-28 阿里巴巴集团控股有限公司 Product recommendation method and device
CN110555208A (en) * 2018-06-04 2019-12-10 北京三快在线科技有限公司 ambiguity elimination method and device in information query and electronic equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387531B1 (en) * 2015-08-18 2019-08-20 Google Llc Processing structured documents using convolutional neural networks
CN106445919A (en) * 2016-09-28 2017-02-22 上海智臻智能网络科技股份有限公司 Sentiment classifying method and device
KR20180137168A (en) * 2017-06-16 2018-12-27 (주)이스트소프트 Apparatus for classifying category of a text based on neural network, method thereof and computer recordable medium storing program to perform the method
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
WO2019105134A1 (en) * 2017-11-30 2019-06-06 阿里巴巴集团控股有限公司 Word vector processing method, apparatus and device
WO2019210820A1 (en) * 2018-05-03 2019-11-07 华为技术有限公司 Information output method and apparatus
WO2019223379A1 (en) * 2018-05-22 2019-11-28 阿里巴巴集团控股有限公司 Product recommendation method and device
CN110555208A (en) * 2018-06-04 2019-12-10 北京三快在线科技有限公司 ambiguity elimination method and device in information query and electronic equipment
CN108984523A (en) * 2018-06-29 2018-12-11 重庆邮电大学 A kind of comment on commodity sentiment analysis method based on deep learning model
CN110046344A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 Add the method and terminal device of separator
CN109977413A (en) * 2019-03-29 2019-07-05 南京邮电大学 A kind of sentiment analysis method based on improvement CNN-LDA
CN110059191A (en) * 2019-05-07 2019-07-26 山东师范大学 A kind of text sentiment classification method and device
CN110245685A (en) * 2019-05-15 2019-09-17 清华大学 Genome unit point makes a variation pathogenic prediction technique, system and storage medium
CN110209823A (en) * 2019-06-12 2019-09-06 齐鲁工业大学 A kind of multi-tag file classification method and system
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王学锋,杨若鹏,李雯: "基于深度学习的作战文书事件抽取方法", 信息工程大学学报,第05期, 15 October 2019 (2019-10-15), pages 1 - 6 *
王学锋;杨若鹏;李雯;: "基于深度学习的作战文书事件抽取方法", 信息工程大学学报, no. 05, 15 October 2019 (2019-10-15), pages 1 - 6 *

Similar Documents

Publication Publication Date Title
CN110347835B (en) Text clustering method, electronic device and storage medium
RU2678716C1 (en) Use of autoencoders for learning text classifiers in natural language
WO2020232861A1 (en) Named entity recognition method, electronic device and storage medium
CN110851596A (en) Text classification method and device and computer readable storage medium
CN109684476B (en) Text classification method, text classification device and terminal equipment
CN111475622A (en) Text classification method, device, terminal and storage medium
CN112784009B (en) Method and device for mining subject term, electronic equipment and storage medium
CN107357895B (en) Text representation processing method based on bag-of-words model
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN110858217A (en) Method and device for detecting microblog sensitive topics and readable storage medium
CN111488732B (en) Method, system and related equipment for detecting deformed keywords
WO2021143009A1 (en) Text clustering method and apparatus
CN114218945A (en) Entity identification method, device, server and storage medium
CN113127607A (en) Text data labeling method and device, electronic equipment and readable storage medium
CN115312033A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN113535912B (en) Text association method and related equipment based on graph rolling network and attention mechanism
CN107533672A (en) Pattern recognition device, mode identification method and program
CN109902162B (en) Text similarity identification method based on digital fingerprints, storage medium and device
CN111160445A (en) Bid document similarity calculation method and device
CN110597982A (en) Short text topic clustering algorithm based on word co-occurrence network
CN112988921A (en) Method and device for identifying map information change
CN112528646B (en) Word vector generation method, terminal device and computer-readable storage medium
CN111428180B (en) Webpage duplicate removal method, device and equipment
CN114662668A (en) Neural network training method, semantic similarity calculation method and semantic retrieval system
CN113868417A (en) Sensitive comment identification method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination