CN111209362A - Address data analysis method based on deep learning - Google Patents

Address data analysis method based on deep learning Download PDF

Info

Publication number
CN111209362A
CN111209362A CN202010011871.9A CN202010011871A CN111209362A CN 111209362 A CN111209362 A CN 111209362A CN 202010011871 A CN202010011871 A CN 202010011871A CN 111209362 A CN111209362 A CN 111209362A
Authority
CN
China
Prior art keywords
address
data
model
training
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010011871.9A
Other languages
Chinese (zh)
Inventor
张磊
陶虹
张旭方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Chengfang Information Technology Co ltd
Original Assignee
Suzhou Chengfang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Chengfang Information Technology Co ltd filed Critical Suzhou Chengfang Information Technology Co ltd
Priority to CN202010011871.9A priority Critical patent/CN111209362A/en
Publication of CN111209362A publication Critical patent/CN111209362A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an address data analysis method based on deep learning, which maps address data to corresponding key parcel information according to address analysis requirements to carry out multi-dimensional data labeling, wherein the labeled key parcel information data have different types of label address name content texts; performing word segmentation processing on the multi-dimensional labeled address name content text to generate address training data; constructing a BilSTM-CNN-CRF model for training. The invention starts from the problems encountered in the actual service of the address resolution of the place name, constructs the corresponding abstract modeling of the address resolution and the multidimensional data marking, liberates the complex processes of word segmentation, matching and identification in the service and realizes the end-to-end fusion processing mode.

Description

Address data analysis method based on deep learning
Technical Field
The invention belongs to the technical field of geographical name address resolution, and particularly relates to an address data resolution method based on deep learning.
Background
Today in the information age, each department in a city stores a large amount of geographical location information related to addresses, most of the data is non-spatial information, and data sharing between industries cannot be realized through a geographical information system. Therefore, the spatialization of the urban address information is an important component of digital urban construction.
The geocoding technology is a method for realizing the spatialization of the urban address information, provides a mode for converting the address information described by a text into geographic coordinates, and determines the corresponding geographic entity position of the address data on an electronic map by the coding technology and address matching. Through the geocoding technology, a large amount of social and economic data are changed into coordinated spatial information, so that faster and effective spatial analysis is performed, and support is provided for government decision.
Natural Language Processing (NLP) is a technique that enables computers to understand human languages. Among them, word segmentation technology is a basic task. In the international commonly used NLP algorithm, deep syntax semantic analysis usually uses words as basic units, and word segmentation is usually the primary task of NLP. When a model in the NLP domain is built, modeling personnel are often required to master certain linguistic knowledge to extract appropriate features. The deep learning has excellent generalization capability, can extract features based on data without supervision, and has the advantages that context information features are learned from training data, and the part of experimenters needed to do is to design the structure of a neural network, so that high-quality training data is provided. The quick inquiry and matching of the address and the spatialization of social and economic data are realized by utilizing a geocoding technology, and the unified management of a database is established, so that the sharing of data of all departments and industries in a city is realized. The existing address word segmentation model is needed, and the word segmentation accuracy is greatly improved. According to the invention, the address resolution algorithm based on deep learning is constructed, so that the resolution success rate of two types of fuzzy addresses, namely address deformity and ambiguity, is improved.
Disclosure of Invention
The technical problem is as follows: the invention provides an address data analysis method based on deep learning, aiming at the problems that the traditional place name address analysis uses a database full-scale retrieval matching mode (word segmentation-matching-recognition) and has slow analysis speed and low success rate. The invention starts from the problems encountered in the actual service of the address resolution of the place name, constructs the corresponding abstract modeling of the address resolution and the multidimensional data marking, liberates the complex processes of word segmentation, matching and identification in the service and realizes the end-to-end fusion processing mode.
The invention models the address into a process for extracting the key parcel information in the address data, and further abstracts the process for extracting the information into a multi-class classification problem of the parcel information. When a deep learning model of address resolution is established, address data are continuously marked with multi-dimensional data according to the requirement of address resolution, the marked address data have different label contents, specifically, administrative division, roads, plots, doorplates, buildings, rooms and interference information in the address data are marked with multiple categories, wherein the important point is that the incomplete and ambiguous addresses are marked with the multi-dimensional data according to the same marking mode. The trained model can identify corresponding parcel information in the address, can automatically eliminate interference and useless information in address data, and greatly improves the accuracy and speed of analysis.
The technical scheme is as follows: the invention discloses an address data analysis method based on deep learning, which comprises the following steps:
mapping the address data to corresponding key parcel information according to address resolution requirements for carrying out multi-dimensional data marking, wherein the marked key parcel information data have different types of label address name content texts; and performing word segmentation processing on the multi-dimension labeled address name content text to generate address training data.
The method comprises the steps that address information is split and labeled to obtain a sequence word segment text, the sequence word segment text is used as training data, each word is assigned with a word vector through word embedding to be used as expression of the address text, and a computer reads in the training data; setting a threshold value for the address length of the Chinese language, and if address data exceeding the threshold value of the address length exist, deleting and filtering; the whole process of building the deep learning model labels the address information, which is the most time-consuming work in the model training process, and the labeled training data is expressed as the address text by a word embedding technology, so that a computer can read and understand the input data. Secondly, the expressed data is learned through a model consisting of a BilSTM + CNN + CRF layer. And finally, outputting the learning result of the model, and extracting the key information in the address according to the labeling result.
Such as: 'the region rocchy yi fenchy 1 house 109' is labeled 'OOA 1A2C1C 2F1F2E1E2E 2', where O denotes garbage, the end of C1 to C2 is xx information, F1 to F2 are xx information, and E1 to E2 are xx information, and extraction is performed for address resolution based on the labeled result.
Constructing a BiLSTM-CNN-CRF model for training; and arranging the training data in sequence, determining word segment structure relevance through word vectors and part-of-speech characteristics, and outputting tensor characteristics formed by splicing the word vectors and the part-of-speech characteristics. The word embedding technology is mainly used for overcoming the difficulties of uneven text length and the incorporation of word-to-word relations into the model. In short, each word is assigned with a word vector, the vector represents a point in the space, words with close meanings are also close to the word vector, and thus, the operation on the word can be converted into the operation on the vector, which is called a Tensor (Tensor) in deep learning. The tensor of the text implies the combined meaning among a plurality of words, which can be regarded as the characteristic engineering of the text, and further passes the foundation for the machine learning and the deep learning text analysis.
Arranging the address training data in sequence, determining word segment structure relevance through word embedding, and outputting corresponding word vectors; the word embedding technology is mainly used for overcoming the difficulties of uneven text length and the incorporation of word-to-word relations into the model. In short, each word is given a reasonable vector expression, the vector represents a point in the space, the words with close meanings are close, the word vectors are also close, and thus, the operation on the words can be converted into the operation on the vectors, which is called a Tensor (Tensor) in deep learning. The tensor of the text implies the combined meaning among a plurality of words, which can be regarded as a preprocessing process of the text, and further provides a basis for machine learning and deep learning text analysis.
The word vector is respectively combined with context associated information fusion learning according to a forward sequence and a reverse sequence through a BilSTM model and a CNN model to obtain a state vector, the state vector is extracted again into the BilSTM model and then is trained and then is conveyed into a CRF model, and the CRF model automatically extracts a sequence rule and outputs key address sequence information after finishing correction; in the sequence tagging task (Chinese word segmentation CWS, part of speech tagging POS, named entity recognition NER, etc.), the currently mainstream deep learning framework is BiLSTM + CRF. The BilSTM integrates two groups of learning directions which are opposite (one is in sentence sequence, and the other is in reverse sentence sequence), theoretically, the mutual relation between the front to the back and the back to the front in the current address information can be captured, and simply, key information can be better grasped after the context is known, so that the BilSTM model is more favorable for labeling the current word.
During model training, adjusting the influence of the complexity of the model on a loss function to prevent overfitting of the model; in the training process, the learning rate of the training is adjusted to be half of the original learning rate every 5 rounds, so that the model can be trained better, and the optimal address key information extraction model is obtained. For example, a dropout code layer and an earlystopping function in a keras algorithm are used to prevent model overfitting, a learning rate is adjusted by using a learngrateschandler function in the keras algorithm, and the learning rate is reduced to half of the original learning rate every 5 epochs during training.
The problem of uneven length of characters can be solved by expressing words through tensor, because if each word has a corresponding word vector, for a text with the length of N, the tensor is input as long as the vectors represented by the corresponding N words are selected and arranged together according to the sequence of the words in the text, wherein the dimensionality of each word vector is the same. In addition, the words themselves cannot form features, but the tensor is the quantification of abstraction, which is computed from layer-to-layer abstraction of a multi-layer neural network. Also text is composed of words, and features of text may be combined by tensors of words.
Has the advantages that: the invention provides an address data analysis method based on deep learning, which solves the problem of uneven character length by adopting an address analysis abstract modeling and data multi-dimensional labeling and a word embedding technology. In addition, the words themselves cannot form features, but the tensor is the quantification of abstraction, which is computed from layer-to-layer abstraction of a multi-layer neural network.
Experimental data prove that under the condition that training samples are sufficient, the accuracy of the method on the test set reaches 0.9997, because the rule of extracting address word segmentation data by adopting threshold screening and repeated training is simple, the accuracy is high. Because the input address has the condition of deformity and ambiguity, the model can effectively extract the deformity and the ambiguity, for example: when the Suzhou industrial park and the Suzhou public park are extracted by using the models, the Suzhou industrial park and the Suzhou public park are taken as a whole, and the accuracy of extracting information from addresses is guaranteed.
In order to improve the matching success rate of two fuzzy addresses of address defects and ambiguities, the invention constructs a Chinese word segmentation model based on a word-embedded bidirectional long-short term memory network (BilSTM), a one-dimensional Convolutional Neural Network (CNN) and a Conditional Random Field (CRF). The model firstly marks address information and sets a threshold value to delete and filter address data; and tensor expression words are adopted, the state tensor secondary BilSTM model is repeatedly trained and transmitted to a CRF model for automatic correction, and then key address sequence information is output, so that the word segmentation accuracy is realized.
Drawings
FIG. 1 is a block diagram of the overall process of the present invention.
Detailed Description
In order that the technical objects and features of the present invention can be more clearly understood, the present invention will be described in detail with reference to specific embodiments.
As shown in fig. 1, the present invention discloses an address data parsing method based on deep learning, which includes:
mapping the address data to corresponding key parcel information according to address resolution requirements for carrying out multi-dimensional data marking, wherein the marked key parcel information data have different types of label address name content texts;
performing word segmentation processing on the multi-dimensional labeled address name content text to generate address training data; splitting and labeling the address information to obtain a sequence word segment text, wherein the sequence word segment text is used as training data, each word is assigned with a word vector through a word embedding technology to express the address text, and a computer can identify the training data; the method comprises the steps of setting a threshold value for the address length of the Chinese language, and deleting and filtering if address data exceeding the threshold value of the address length exists.
Constructing a BiLSTM-CNN-CRF model for training; the address modeling becomes a process for extracting the key parcel information in the address data, and the process for extracting the information is further abstracted to be a multi-class classification problem of the parcel information. When a deep learning model of address resolution is established, address data are continuously marked with multi-dimensional data according to the requirement of address resolution, the marked address data have different label contents, specifically, administrative division, roads, plots, doorplates, buildings, rooms and interference information in the address data are marked with multiple categories, wherein the important point is that the incomplete and ambiguous addresses are marked with the multi-dimensional data according to the same marking mode. The trained model can identify corresponding parcel information in the address, can automatically eliminate interference and useless information in address data, and greatly improves the accuracy and speed of analysis.
Arranging the address training data in sequence, determining word segment structure relevance through word embedding, and outputting corresponding word vectors; the word embedding technology is mainly used for overcoming the difficulties of uneven text length and the incorporation of word-to-word relations into the model. In short, each word is given a reasonable vector expression, the vector represents a point in the space, the words with close meanings are close, the word vectors are also close, and thus, the operation on the words can be converted into the operation on the vectors, which is called a Tensor (Tensor) in deep learning. The tensor of the text implies the combined meaning among a plurality of words, which can be regarded as a preprocessing process of the text, and further provides a basis for machine learning and deep learning text analysis.
And respectively combining the word vector with context associated information fusion learning according to the forward sequence and the reverse sequence through a BilSTM model and a CNN model to obtain a state vector, extracting the state vector into the BilSTM model again, training the state vector, and then conveying the state vector into a CRF model, automatically extracting sequence rules by the CRF model, and outputting key address sequence information after finishing correction.
During model training, adjusting the influence of the complexity of the model on a loss function to prevent overfitting of the model; wherein, the learning rate of the training is adjusted to be half of the original learning rate every 5 rounds in the training process. The model can be trained better, and the optimal address key information extraction model is obtained.
If the input sentence is composed of 32 words, each word is represented by a 128-dimensional word vector, the input corresponding to the model is (32, 128), the hidden vector quantity is changed into T1(32, 128) after the BilSTM, wherein 128 is the output dimension of the BilSTM in the model. If the CRF layer is not used, a full-connection layer can be added at the end of the model for 13 classification, and finally a label with high probability is taken as a prediction label. Through a large amount of labeled data and model continuous iterative optimization, the method can learn a good key address information extraction model.
However, although relying on the powerful nonlinear fitting capability of neural networks, good models can be theoretically learned. However, the above model only takes into account the contextual information on the tag. For the sequence labeling task, the label L _ t of the current position has potential relation with the previous position L _ t-1 and the next position L _ t + 1. For example, "clock/B1 garden/B2 way/B21/D1/D2" is labeled as "clock/B1 garden/E2 way/B21/D1/D2", and as can be seen from the labeling rule of information extraction, B1 labels can only be connected with B2, so the model utilizes the context information between such labels. Thus, researchers in the field of natural language processing have proposed a CRF layer following the model for learning the optimal tag sequence over the entire sequence. The addition of the CRF layer can reduce some unnecessary errors in labeling, such as: 1. b1 is followed by a note other than B2; 2. e2 appears in the first series of questions; in short, the errors are errors which cannot occur in data labeling, and are not practical, and in order to process the problems, a CRF layer is added into a BilSTM model, so that some unrealistic results can be avoided, and the accuracy of the model is effectively improved.
So far, the BilSTM-CRF model has been generally known. For the address key information extraction task, the labels of the current words are basically associated with only the first few words and several words. BilSTM adds a CNN layer in the model for extracting the local features of the current word because some important information is discarded due to the problem of model capacity when learning a longer sentence.
Let sentence input dimension be (32, 100), get T2(32, 50) after equal length convolution, where 50 is the number of convolution kernels. The 50-dimensional vector corresponding to the current word contains its local context information. We splice T1 and T2 to get T3(32, 178), T3 gets T4(32, 13) through the full connection layer, T4 is input to the CRF layer, and the final optimal sequence is calculated.
In the processes of machine learning and deep learning, the time spent on data processing is indispensable, because the result of data preparation directly affects the result of the model, and the process of preprocessing the data is often referred to as feature engineering. And introducing the data processing process of the model.
Considering the problem of the length of Chinese addresses, each address is almost less than 32, address data with more than 32 bits is deleted, only 8 addresses with more than 32 bits are marked in 175W data, addresses with less than 32 bits are marked with a category at the later position, and it is noted that useless information in the addresses is also represented by the same category. In short, the address information is labeled with 13 categories in total, the numbers of 0-12 are correspondingly labeled for the 13 categories, and the corresponding category numbers are subjected to One _ Hot function transformation, so that the label data is processed into a form meeting the model input, and the original address data is processed by using a bag-of-words model. The processing of characters which do not appear in the new address is considered, the characters which do not appear in the word bag are uniformly marked into the numbers which do not appear in the word bag, and errors are avoided when the test data are represented by the word bag; thus, the whole data preprocessing process is completely finished.
When a deep learning model is trained, the largest problem is the overfitting problem, so that the network can be stably trained on the provided data, multiple modes are used for preventing the overfitting problem of the model during model training, the learning rate of the training is adjusted to be half of the original learning rate every 5 rounds in the training process, the model can be better trained, and the optimal address key information extraction model is obtained.
And processing the data on the test set and the data on the training set to obtain the model accuracy. The accuracy rate on the test set reaches 0.9997, probably because the address data is simple in rule, and the accuracy rate is high. Because the input address has the condition of deformity and ambiguity, the model can effectively extract the deformity and the ambiguity, for example: when the Suzhou industrial park and the Suzhou public park are extracted by using the models, the Suzhou industrial park and the Suzhou public park are taken as a whole, and the accuracy of extracting information from addresses is guaranteed.
For a training model, in complex address data, errors may occur in the result predicted by the address model, and the wrong word segmentation result can be adjusted by adopting a retraining mode, so that the practicability and accuracy of the training model are improved.

Claims (1)

1. An address data parsing method based on deep learning is characterized by comprising the following steps:
mapping the address data to corresponding key parcel information according to address resolution requirements for carrying out multi-dimensional data marking, wherein the marked key parcel information data have different types of label address name content texts;
performing word segmentation processing on the multi-dimensional labeled address name content text to generate address training data;
constructing a BiLSTM-CNN-CRF model for training;
arranging the address training data in sequence, determining word segment structure relevance through word embedding, and outputting corresponding word vectors;
the word vector is respectively combined with context associated information fusion learning according to a forward sequence and a reverse sequence through a BilSTM model and a CNN model to obtain a state vector, the state vector is extracted again into the BilSTM model and then is trained and then is conveyed into a CRF model, and the CRF model automatically extracts a sequence rule and outputs key address sequence information after finishing correction;
during model training, adjusting the influence of the complexity of the model on a loss function to prevent overfitting of the model; wherein, the learning rate of the training is adjusted to be half of the original learning rate every 5 rounds in the training process.
CN202010011871.9A 2020-01-07 2020-01-07 Address data analysis method based on deep learning Pending CN111209362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010011871.9A CN111209362A (en) 2020-01-07 2020-01-07 Address data analysis method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010011871.9A CN111209362A (en) 2020-01-07 2020-01-07 Address data analysis method based on deep learning

Publications (1)

Publication Number Publication Date
CN111209362A true CN111209362A (en) 2020-05-29

Family

ID=70785598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010011871.9A Pending CN111209362A (en) 2020-01-07 2020-01-07 Address data analysis method based on deep learning

Country Status (1)

Country Link
CN (1) CN111209362A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527933A (en) * 2020-12-04 2021-03-19 重庆市地理信息和遥感应用中心 Chinese address association method based on space position and text training
CN113536794A (en) * 2021-06-22 2021-10-22 河北远东通信系统工程有限公司 Confidence-based Active-BilSTM-CRF Chinese level address word segmentation method
WO2022134592A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Address information resolution method, apparatus and device, and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491548A (en) * 2017-08-28 2017-12-19 武汉烽火普天信息技术有限公司 A kind of network public-opinion text message recommends and method for visualizing
CN108763201A (en) * 2018-05-17 2018-11-06 南京大学 A kind of open field Chinese text name entity recognition method based on semi-supervised learning
CN108959252A (en) * 2018-06-28 2018-12-07 中国人民解放军国防科技大学 Semi-supervised Chinese named entity recognition method based on deep learning
CN109388749A (en) * 2018-09-29 2019-02-26 武汉烽火普天信息技术有限公司 The detection of accurate high-efficiency network public sentiment and method for early warning based on multi-layer geography
CN109614456A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of the positioning partition method and device of the geography information based on deep learning
CN109857327A (en) * 2017-03-27 2019-06-07 三角兽(北京)科技有限公司 Information processing unit, information processing method and storage medium
CN110096713A (en) * 2019-03-21 2019-08-06 昆明理工大学 A kind of Laotian organization names recognition methods based on SVM-BiLSTM-CRF
CN110110335A (en) * 2019-05-09 2019-08-09 南京大学 A kind of name entity recognition method based on Overlay model
CN110334339A (en) * 2019-04-30 2019-10-15 华中科技大学 It is a kind of based on location aware from the sequence labelling model and mask method of attention mechanism
CN110377686A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of address information Feature Extraction Method based on deep neural network model
CN110459282A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Sequence labelling model training method, electronic health record processing method and relevant apparatus
CN110457682A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Electronic health record part-of-speech tagging method, model training method and relevant apparatus

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857327A (en) * 2017-03-27 2019-06-07 三角兽(北京)科技有限公司 Information processing unit, information processing method and storage medium
CN107491548A (en) * 2017-08-28 2017-12-19 武汉烽火普天信息技术有限公司 A kind of network public-opinion text message recommends and method for visualizing
CN108763201A (en) * 2018-05-17 2018-11-06 南京大学 A kind of open field Chinese text name entity recognition method based on semi-supervised learning
CN108959252A (en) * 2018-06-28 2018-12-07 中国人民解放军国防科技大学 Semi-supervised Chinese named entity recognition method based on deep learning
CN109388749A (en) * 2018-09-29 2019-02-26 武汉烽火普天信息技术有限公司 The detection of accurate high-efficiency network public sentiment and method for early warning based on multi-layer geography
CN109614456A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of the positioning partition method and device of the geography information based on deep learning
CN110096713A (en) * 2019-03-21 2019-08-06 昆明理工大学 A kind of Laotian organization names recognition methods based on SVM-BiLSTM-CRF
CN110334339A (en) * 2019-04-30 2019-10-15 华中科技大学 It is a kind of based on location aware from the sequence labelling model and mask method of attention mechanism
CN110110335A (en) * 2019-05-09 2019-08-09 南京大学 A kind of name entity recognition method based on Overlay model
CN110377686A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of address information Feature Extraction Method based on deep neural network model
CN110459282A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Sequence labelling model training method, electronic health record processing method and relevant apparatus
CN110457682A (en) * 2019-07-11 2019-11-15 新华三大数据技术有限公司 Electronic health record part-of-speech tagging method, model training method and relevant apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MA X Z 等: "End to end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527933A (en) * 2020-12-04 2021-03-19 重庆市地理信息和遥感应用中心 Chinese address association method based on space position and text training
WO2022134592A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Address information resolution method, apparatus and device, and storage medium
CN113536794A (en) * 2021-06-22 2021-10-22 河北远东通信系统工程有限公司 Confidence-based Active-BilSTM-CRF Chinese level address word segmentation method

Similar Documents

Publication Publication Date Title
CN112199511B (en) Cross-language multi-source vertical domain knowledge graph construction method
CN110633409B (en) Automobile news event extraction method integrating rules and deep learning
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN113312501A (en) Construction method and device of safety knowledge self-service query system based on knowledge graph
CN113177124A (en) Vertical domain knowledge graph construction method and system
CN107871158A (en) A kind of knowledge mapping of binding sequence text message represents learning method and device
CN111209362A (en) Address data analysis method based on deep learning
CN110263325A (en) Chinese automatic word-cut
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN111274804A (en) Case information extraction method based on named entity recognition
CN114444507A (en) Context parameter Chinese entity prediction method based on water environment knowledge map enhancement relationship
CN114911945A (en) Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN113821635A (en) Text abstract generation method and system for financial field
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition
CN114077673A (en) Knowledge graph construction method based on BTBC model
CN116383352A (en) Knowledge graph-based method for constructing field intelligent question-answering system by using zero samples
CN114238524B (en) Satellite frequency-orbit data information extraction method based on enhanced sample model
CN111428502A (en) Named entity labeling method for military corpus
CN114117000A (en) Response method, device, equipment and storage medium
CN114239828A (en) Supply chain affair map construction method based on causal relationship
CN112699685A (en) Named entity recognition method based on label-guided word fusion
CN115270774B (en) Big data keyword dictionary construction method for semi-supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200529