CN114003812A - Address matching method, system, device and storage medium - Google Patents

Address matching method, system, device and storage medium Download PDF

Info

Publication number
CN114003812A
CN114003812A CN202111274139.1A CN202111274139A CN114003812A CN 114003812 A CN114003812 A CN 114003812A CN 202111274139 A CN202111274139 A CN 202111274139A CN 114003812 A CN114003812 A CN 114003812A
Authority
CN
China
Prior art keywords
address
matching
labeled
preset
addresses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111274139.1A
Other languages
Chinese (zh)
Inventor
李洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202111274139.1A priority Critical patent/CN114003812A/en
Publication of CN114003812A publication Critical patent/CN114003812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an address matching method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring a preprocessed target address; inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, wherein the trained CRF splitting model is obtained by training based on a preset characteristic template and training data; and acquiring an alternative matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine. The embodiment of the invention reduces the workload of all manually processed address information, can quickly position and match complete address information, greatly improves the processing speed of the address information, reduces the waiting time of a client, can quickly position the client to a specific cell through the address information, can realize quick response and better serve the client.

Description

Address matching method, system, device and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an address matching method, system, device, and storage medium.
Background
At present, in the requirement of address splitting and matching, most of the information of regular addresses is split by adopting a regular expression mode, the regular expression is difficult to completely cover due to different address information data in all parts of the country, the irregularity degree of the addresses is quite large, and the disordered address labeling and random address expression modes bring great difficulty to address resolution and are difficult to accurately split the irregular address information.
Due to the fact that accurate splitting cannot be carried out, the obtained split address information cannot be automatically matched with complete correct address information, manual intervention is needed due to the problems, and when batch of irregular address information appears, a large amount of manual matching needs to be consumed to obtain correct detailed addresses.
Disclosure of Invention
The invention provides an address matching method, system, equipment and storage medium, and mainly aims to accurately divide irregular or unrefined address information, effectively improve the address splitting precision and accuracy and improve the subsequent address matching precision.
In a first aspect, an embodiment of the present invention provides an address matching method, including:
acquiring a preprocessed target address;
inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, wherein the trained CRF splitting model is obtained by training based on a preset characteristic template and training data;
and acquiring an alternative matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine.
Preferably, the method further comprises the following steps:
acquiring a reference matching address according to a preset address element of the optimal labeling address sequence and the preset ElasticSearch search engine;
and acquiring the best matching address according to the confidence degree between the reference matching address and the alternative matching address.
Preferably, the training data is obtained by:
s211, acquiring a labeled address library and a preprocessed unlabeled address library in an original corpus, wherein the labeled address library is obtained by labeling according to a preset classification labeling system;
s212, training the initial CRF splitting model according to the labeled address library to obtain a target CRF splitting model;
s213, according to the target CRF splitting model, marking part of unmarked addresses in the preprocessed unmarked address library to obtain a marked address sequence corresponding to the part of unmarked addresses;
s214, updating the labeled address library by using the part of unlabeled addresses and the labeled address sequence corresponding to the part of unlabeled addresses, using the updated labeled address library as the labeled address library again, and using the target CRF splitting model as the initial CRF splitting model again;
s215, repeating the steps S212 to S214 until the number of the remaining unmarked addresses in the unmarked address base is less than a preset number threshold, and using the addresses in the marked address base as training data.
Preferably, the updating the labeled address library by using the part of unlabeled addresses and the standard address sequence corresponding to the part of unlabeled addresses includes:
deleting the part of the un-labeled addresses with the confidence degrees larger than a preset confidence degree threshold value from the un-labeled address library according to the confidence degrees between the part of the un-labeled addresses and the corresponding labeled address sequences;
and adding the corresponding labeled address sequence into the labeled address library to obtain an updated labeled address library.
Preferably, the confidence between the part of the unlabeled addresses and the corresponding labeled address sequence is obtained as follows:
Figure BDA0003328853510000031
wherein, CxRepresenting the confidence between the unmarked address corpus and the corresponding marked address sequence, i representing the current position, and X ═ X1,x2,…,xn) For no address, Y ═ Y1,y2,…,yn) And representing the predicted labeled address sequence, wherein an input variable X is X, and an output variable Y is Y.
Preferably, the trained CRF split model is obtained by training based on a preset feature template and training data, and is obtained through the following steps:
acquiring a characteristic function according to the preset characteristic template;
and extracting features of the training data according to the feature function, training the initial CRF splitting model by combining the weight of each feature, and obtaining the trained CRF splitting model.
Preferably, the obtaining a best matching address according to the confidence between the reference matching address and the candidate matching address includes:
for any alternative matching address, if the confidence degrees of the reference matching address and the any alternative matching address are greater than a first preset matching threshold, respectively matching the cell information of the reference matching address and the cell information of the any alternative matching address, the path number information of the reference matching address and the path number information of the any alternative matching address, and taking the any alternative matching address as an optimal matching address if the cell matching result and the path number matching result are both greater than a second preset matching threshold;
if the confidence degrees of the reference matching address and any one of the alternative matching addresses are smaller than the first preset matching threshold, combining the route number information and the cell information of the reference matching address, combining the route number information and the cell information of any one of the alternative matching addresses, and if the degree of matching after the combination of the route number information and the cell information is larger than the first preset matching threshold, taking any one of the alternative matching addresses as the best matching address.
In a second aspect, an embodiment of the present invention provides an address matching system, including:
the acquisition module is used for acquiring the preprocessed target address;
the sequence module is used for inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, and the trained CRF splitting model is obtained by training based on a preset characteristic template and training data;
and the matching module is used for acquiring the alternative matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the address matching method when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the address matching method.
According to the address matching method, the system, the equipment and the storage medium, irregular information in the target address is removed after the target address is preprocessed, and then the information is input into the trained CRF splitting model, and the CRF splitting model can accurately split irregular or non-detailed address information, so that the splitting precision and the accuracy of the target address are improved, and the subsequent address matching precision is improved; and then matching in a preset ElasticSearch search engine according to the optimal labeling address sequence, and quickly matching regular, detailed and accurate address information by fully utilizing the self-contained search function of the preset ElasticSearch search engine.
The embodiment of the invention reduces the workload of all manually processed address information, can quickly position and match complete address information, greatly improves the processing speed of the address information, reduces the waiting time of a client, can quickly position the client to a specific cell through the address information, can realize quick response and better serve the client.
Drawings
Fig. 1 is an application scenario diagram of an address matching method according to an embodiment of the present invention;
fig. 2 is a flowchart of an address matching method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an address matching system according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is an application scenario diagram of an address matching method according to an embodiment of the present invention, as shown in fig. 1, a user inputs a target address in a client, the client extracts the target address and sends the target address to a server, and the server receives the target address and then executes an address matching method to match the target address.
It should be noted that the server may be implemented by an independent server or a server cluster composed of a plurality of servers. The client may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The client and the server may be connected through bluetooth, USB (Universal Serial Bus), or other communication connection manners, which is not limited in this embodiment of the present invention.
Fig. 2 is a flowchart of an address matching method according to an embodiment of the present invention, and as shown in fig. 2, the method includes:
s210, acquiring a preprocessed target address;
firstly, a preprocessed target address is obtained, the target address is an address which needs to be matched, such as a mail address, a usual writing address and the like, generally speaking, many non-normalized expressions exist in the writing address and the mail address which are contacted at ordinary times, and redundant and meaningless address components in the writing address and the mail address need to be identified, wherein the common non-normalized condition is as follows: the numbers are not uniform, special symbols exist, address data are too short, and key route number cell information is lacked.
For example, for the address "the small golden village and small thunder road in the thunder field, near the north of 2 kilometers of the intersection with the ai tai road", the address "and" belong to meaningless components, and the meaningless components "and", "and" are required to be marked. The spatial relationship refers to the topological relationship among the address elements, mainly includes adjacency, association and inclusion relationships, the description of the corresponding spatial relationship includes a distance relationship of 2 km, a direction relationship of north, a cross relationship of cross, a fuzzy description of vicinity and the like, and the identification of the components can add spatial constraint in the geographic marking to improve the positioning accuracy. Geographical naming entities such as place names and organizational names are the main components in addresses and are also difficult to identify. The structured address is easy to be divided by using place name suffixes such as county and town, and the common administrative division place name falling condition that the 'thunder town' is omitted as 'thunder field' needs to be marked in the non-standard address. For the organization name present in the address, such as "east garden," embodiments of the present invention attribute it to the cell.
The preprocessing of the target address comprises the steps of unifying the number format, removing special symbols, and filtering invalid addresses to prevent errors generated by noise points. By preprocessing the target address, some irregular expressions in the address can be roughly filtered out, but the irregular address cannot be completely converted into the regular address, so that subsequent processing is required.
S220, inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, wherein the trained CRF splitting model is obtained by training based on a preset characteristic template and training data;
inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling sequence, wherein the optimal labeling sequence is administrative divisions and interest points of each level in the split target address, each level of administrative divisions refer to administrative places such as provinces, cities, districts and towns corresponding to the target address, and the interest points refer to final residential building or room.
For example, for the target address "the mountainous area of wuhan city and the acute creation center of level one 1267" in the north of huh province, "province-north of huh province, city-wuhan city, district/county-mountainous area, road/street office-level one 1267" and the point of interest "house number/unit number-acute creation center 2107" may be marked.
In the embodiment of the invention, the CRF split model after training is obtained by training through a preset feature template and training data, the preset feature template is used for determining a feature function for selecting the training data, and the CRF split model is obtained by extracting the corresponding features of the training data and training the CRF split model.
The CRF splitting model is a model determined based on a Conditional Random Field (CRF for short), the Conditional Random Field can comprehensively consider and carry out global statistics on spatial context characteristics, and compared with other sequence labeling models, the optimal labeling result can be obtained.
In addition, in the embodiment of the invention, the training data is obtained by a self-training and manual mixed iteration method, some addresses are marked manually, then the marked addresses are used for training the CRF split model, and the trained CRF split model is used for marking other unmarked addresses, so that all marked address corpora are used as the training data. And retraining the CRF splitting model by using the training data.
And S230, acquiring a candidate matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine.
As can be seen from the above, the optimal tagged address sequence includes a plurality of address elements, the address elements include information such as province, city, district, house number or unit number, and are searched in a pre-established elastosearch search engine, and the specific search can be performed according to keywords such as a cell ID, a city, an administrative district, a street, a cell name, an address name, a cell alias, an address alias, and the like, that is, the current search index can be the cell ID, the city, the administrative district, the street, the cell name, the address name, and the like, and is specifically determined according to the actual situation, and the response speed can reach millisecond level, that is, fuzzy search can be performed, and accurate search can also be performed.
The optimal address labeling sequence can be used as a parameter in a search statement, and when the city field is accurately matched, the corresponding street name and the corresponding community building field are respectively matched in a fuzzy mode. The search statement is input into the ElasticSearch engine, and one or more matching results can be returned, that is, one or more alternative matching addresses can be selected, and the determination is specifically performed according to the actual situation. When a plurality of search results are found out according to different search conditions, the result data need to be merged according to the cell ID, repeated data are removed, and the obtained matching result can be used as input data of a confidence degree scoring algorithm.
It should be noted that the preset ElasticSearch engine is a Lucene-based search server, which can conveniently make a large amount of data have the capability of searching, analyzing and exploring. On the premise that the integration of massive cell information is completed, the ElasticSearch search engine firstly stores cell information data into the ElasticSearch in batches, and establishes inverted indexes by respectively using the cells and addresses as index libraries. The Elasticissearch is a distributed, high-expansion and high-real-time search and data analysis engine. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring.
According to the address matching method provided by the invention, after the target address is preprocessed, irregular information in the target address is removed, and then the irregular information is input into a trained CRF splitting model, and the CRF splitting model can accurately split irregular or non-detailed address information, so that the splitting precision and the accuracy of the target address are improved, and the subsequent address matching precision is improved; and then matching in a preset ElasticSearch search engine according to the optimal labeling address sequence, and quickly matching regular, detailed and accurate address information by fully utilizing the self-contained search function of the preset ElasticSearch search engine.
The embodiment of the invention reduces the workload of all manually processed address information, can quickly position and match complete address information, greatly improves the processing speed of the address information, reduces the waiting time of a client, can quickly position the client to a specific cell through the address information, can realize quick response and better serve the client.
On the basis of the above embodiment, it is preferable to further include:
acquiring a reference matching address according to a preset address element of the optimal labeling address sequence and the preset ElasticSearch search engine;
and acquiring the best matching address according to the confidence degree between the reference matching address and the alternative matching address.
When there are a plurality of candidate matching addresses, the best matching address needs to be selected from the candidate matching addresses. The method specifically comprises the steps of taking a preset address element of an optimal labeling address sequence as a search parameter, searching in a preset elastic search engine, wherein the preset address element can be a cell, a house number or a way number, and is specifically determined according to actual conditions.
Generally, when cell information and route number information are searched in an ElasticSearch, unique address information can be determined, and therefore, a reference matching address obtained by using a cell as a search parameter can be used as a confidence calculation reference in the embodiment of the present invention.
And calculating the confidence degree between the reference matching address and the alternative matching address by taking the reference matching address as a reference, wherein the higher the confidence degree is, the higher the accuracy of the alternative matching address is, the lower the confidence degree is, the lower the accuracy of the alternative matching address is, and the best matching address is selected from all the alternative matching addresses according to the confidence degree.
It should be noted that the confidence score algorithm is implemented by performing encapsulation modification through a fuzzy wuzzy string matching tool, and the principle refers to the minimum number of editing operations required for converting one string into another string. The editing operation includes replacing characters, inserting characters, and deleting characters, and generally, the smaller the editing distance, the greater the similarity between two character strings.
According to the address matching method provided by the embodiment of the invention, the best matching address is screened out from a plurality of candidate matching addresses by taking the confidence coefficient as an index, so that the accuracy of address matching is further improved.
On the basis of the above embodiment, preferably, the training data is obtained by:
s211, acquiring a labeled address library and a preprocessed unlabeled address library in an original corpus, wherein the labeled address library is obtained by labeling according to a preset classification labeling system;
s212, training the initial CRF splitting model according to the labeled address library to obtain a target CRF splitting model;
s213, according to the target CRF splitting model, marking part of unmarked addresses in the preprocessed unmarked address library to obtain a marked address sequence corresponding to the part of unmarked addresses;
s214, updating the labeled address library by using the part of unlabeled addresses and the labeled address sequence corresponding to the part of unlabeled addresses, using the updated labeled address library as the labeled address library again, and using the target CRF splitting model as the initial CRF splitting model again;
s215, repeating the steps S212 to S214 until the number of the remaining unmarked addresses in the unmarked address base is less than a preset number threshold, and using the addresses in the marked address base as training data.
Before a CRF split model is trained, training data needs to be determined, in order to obtain the training data, an address element classification and labeling system needs to be designed first to define how to label addresses and express analysis results, and in order to adapt to more standard and non-standard addresses simultaneously, for the components and meanings of all addresses in the training data, a table 1 is an address element preset classification and labeling system table, as shown in table 1. And adding other components including spatial relation description (south side, north side, nearby and the like) and classification of address elements such as redundant punctuation marks, conjunctions and the like in the address on the basis of a multi-level administrative division by referring to a related address model and an existing system.
In order to make the processed target address conform to the data of the CRF input format, the linguistic data after part of speech tagging is converted into a standard format with each line only containing one character and the character tagging, and the standard format is divided by a tabulation character, and a 3-tag tagging set is adopted to respectively represent a first character, a middle character and a tail character through B, M, E letters.
TABLE 1
Mark word Type of address element Illustrate by way of example
PROV Economic Provincial and direct municipality, autonomous region, etc
CITY City (R) City, autonomous state, etc
DIST District/county County, county-level city, etc
TOWN Ballast for ballast Town, village and the like
VILL Village/community Village, community, village and the like
ROAD Road/street office Roads, streets, living committees, etc
DOOR Number plate/unit number Number, layer, building, ridge, seat, etc
POI Point of interest Buildings, squares, companies, etc
SCENE Natural feature Canal, river, lake, river, mountain, etc
CONJ Conjunction word He, river, etc
PUNC Punctuation mark A comma,Brackets and the like
NOR Spatially describing relationships South, north, near, side, etc
The method comprises the steps of firstly obtaining an original corpus, wherein the original corpus comprises various corpus addresses, all the corpus addresses are not labeled at the beginning, in order to obtain training data, the training data need to be labeled according to a preset classification labeling system shown in table 1, a specific labeling method can be that labeling is carried out according to the corpus through a relevant machine learning model, or labeling is carried out manually according to the preset classification labeling system, and the specific labeling method is determined according to actual conditions. In the embodiment of the invention, a part of the corpus addresses are labeled manually according to the address element classification labeling system, after a part of the corpus addresses are labeled, all labeled addresses are used as labeled address libraries, and all unlabeled addresses are used as unlabeled address libraries.
And then, training the initial CRF splitting model by using the labeled address library, and continuously iterating the model parameters in the training process to obtain a target CRF splitting model. The training of the CRF split model mainly comprises the steps of training weight parameters of feature functions, wherein each feature function corresponds to a plurality of feature functions, the value of the feature function is 0 or 1, the weight can be positive number, 0 or negative number, the positive number represents that the contribution proportion of the feature functions is increased, the 0 represents that the feature functions do not contribute, the negative number represents that the contribution proportion of the feature functions is reduced, and finally, the maximum likelihood function is utilized to find the optimal solution.
And then, using the target CRF splitting model to perform labeling prediction on the address linguistic data in the unlabeled address library to obtain a labeled address sequence corresponding to the address linguistic data in the unlabeled address library. And then updating the labeled address library by using the newly labeled address, then training the target CRF splitting model by using the updated labeled address library, repeating the steps S212 to S215 until the number of the unlabeled addresses in the unlabeled address library is less than a preset number threshold, taking the finally updated addresses in the labeled address library and the corresponding labeling sequences as training data, and taking the finally obtained target CRF splitting model as the target CRF splitting model.
It should be noted that the preset number threshold may be determined according to actual situations, and the embodiment of the present invention is not specifically limited herein.
On the basis of the foregoing embodiment, preferably, the updating the labeled address library by using the part of unlabeled addresses and the standard address sequence corresponding to the part of unlabeled addresses includes:
deleting the part of the un-labeled addresses with the confidence degrees larger than a preset confidence degree threshold value from the un-labeled address library according to the confidence degrees between the part of the un-labeled addresses and the corresponding labeled address sequences;
and adding the corresponding labeled address sequence into the labeled address library to obtain an updated labeled address library.
And for an address in any unmarked address library, calculating the confidence coefficient between the address and the marked address sequence predicted by the target CRF splitting model, if the confidence coefficient is greater than a preset confidence coefficient threshold value, indicating that the predicted marked address sequence is more accurate, and moving the address from the unmarked address library to the marked address library. If the confidence between the address and the labeled address sequence predicted by the target CRF splitting model is smaller than a preset confidence threshold, the predicted labeled address sequence is still placed in the unlabeled address library, which indicates that the accuracy of the predicted labeled address sequence is not high.
It should be noted that the preset confidence threshold may be determined according to actual situations, and the embodiment of the present invention is not specifically limited herein.
Specifically, the confidence between the address corpus and the tagged address sequence predicted by the target CRF splitting model is calculated by the following formula:
Figure BDA0003328853510000121
wherein, CxRepresenting the confidence coefficient between the address corpus and the labeled address sequence predicted by the target CRF splitting model, i represents the current position, and X is (X)1,x2,…,xn) For unlabeled address corpus, Y ═ Y1,y2,…,yn) And representing the predicted labeled address sequence, wherein an input variable X is X, and an output variable Y is Y.
In the embodiment of the invention, the conditional random field is used for analyzing the target address, comprehensive, accurate and large-scale labeled corpora are quickly obtained according to a self-training semi-supervised learning and manual mixing method, a corpus training model is selected to form a feature set and a feature template, and the conditional random field model is fused to analyze the Chinese address, so that the address splitting precision and the accuracy are improved, and the subsequent address matching precision is improved.
On the basis of the above embodiment, preferably, the trained CRF split model is obtained by training based on a preset feature template and training data, and is obtained through the following steps:
acquiring a characteristic function according to the preset characteristic template;
and extracting features of the training data according to the feature function, training the initial CRF splitting model by combining the weight of each feature, and obtaining the trained CRF splitting model.
Specifically, the feature template is configured for the feature position relationship, the model selects features in a context window of a current item, generally speaking, the context window is selected from 2 to 3 when a Chinese named entity is identified, the window is too large, the features are increased, the operation efficiency is influenced, the window is too small, the context information of the address elements is lost, and the analysis precision is influenced.
In the embodiment of the invention, a context window is selected as 2 for analysis, a unitary characteristic is constructed, and the characteristics in the front direction and the rear direction are combined and compared by considering a common window of natural language processing and combining a large amount of data analysis.
In the implementation, the feature function is used to extract the corresponding feature.
And determining extracted features according to the feature function, and combining the weight corresponding to the address prediction component in the target CRF split model and the weight corresponding to the context constraint, wherein the extracted features comprise the features of the predicted address component part and the features representing the context constraint, each feature corresponds to a corresponding weight, and the trained CRF split model is obtained by training the target CRF split model.
It should be noted that the target CRF splitting model is obtained after the training of the finally obtained target CRF splitting model is completed.
On the basis of the foregoing embodiment, preferably, the obtaining a best matching address according to the confidence between the reference matching address and the candidate matching address includes:
for any alternative matching address, if the confidence degrees of the reference matching address and the any alternative matching address are greater than a first preset matching threshold, respectively matching the cell information of the reference matching address and the cell information of the any alternative matching address, the path number information of the reference matching address and the path number information of the any alternative matching address, and taking the any alternative matching address as an optimal matching address if the cell matching result and the path number matching result are both greater than a second preset matching threshold;
if the confidence degrees of the reference matching address and any one of the alternative matching addresses are smaller than the first preset matching threshold, combining the route number information and the cell information of the reference matching address, combining the route number information and the cell information of any one of the alternative matching addresses, and if the degree of matching after the combination of the route number information and the cell information is larger than the first preset matching threshold, taking any one of the alternative matching addresses as the best matching address.
Specifically, in the embodiment of the present invention, an example of any one candidate matching address is taken as an example for explanation, and the first preset matching threshold may be specifically determined according to an actual situation, in the embodiment of the present invention, a value of the first preset matching threshold is 90, when a confidence between the candidate matching address and the reference matching address is greater than 90 points, the way number information of the reference matching address and the candidate matching address, and the cell information of the reference matching address and the candidate matching address need to be individually fuzzy-matched, and if a matching degree between the candidate matching address and the reference matching address is greater than 80 points, the candidate matching address is taken as an optimal matching address.
The second preset matching threshold may be specifically determined according to an actual situation, in the embodiment of the present invention, a value of the second preset matching threshold is 80, when the confidence is less than 80 minutes, the route number information of the reference matching address and the cell information, and the route number information of the candidate matching address and the cell information need to be merged, the merged two are matched, then the address data with the maximum confidence is fuzzy matched, and if the score is greater than 90 minutes, the candidate matching address is used as the best matching address.
To sum up, the embodiment of the present invention provides an address matching method, where after a target address is preprocessed, irregular information in the target address is removed, and then the irregular information is input into a trained CRF splitting model, and the CRF splitting model can accurately split irregular or non-detailed address information, so as to improve splitting precision and accuracy of the target address, and further improve subsequent address matching precision; and then matching in a preset ElasticSearch search engine according to the optimal labeling address sequence, and quickly matching regular, detailed and accurate address information by fully utilizing the self-contained search function of the preset ElasticSearch search engine.
The embodiment of the invention reduces the workload of all manually processed address information, can quickly position and match complete address information, greatly improves the processing speed of the address information, reduces the waiting time of a client, can quickly position the client to a specific cell through the address information, can realize quick response and better serve the client.
In addition, in the embodiment of the invention, the conditional random field is used for analyzing the target address, comprehensive, accurate and large-scale labeled corpora are quickly obtained according to a self-training semi-supervised learning and manual mixing method, a corpus training model is selected to form a feature set and a feature template, the conditional random field model is fused to analyze the Chinese address, the address splitting precision and the address splitting accuracy are improved, and the subsequent address matching precision is improved.
Fig. 3 is a schematic structural diagram of an address matching system according to an embodiment of the present invention, as shown in fig. 3, the system includes an obtaining module 310, a sequence module 320, and a matching module 330, where:
the obtaining module 310 is configured to obtain a preprocessed target address;
the sequence module 320 is configured to input the preprocessed target address into a trained CRF splitting model to obtain an optimal tagging address sequence, where the trained CRF splitting model is obtained by training based on a preset feature template and training data;
the matching module 330 is configured to obtain an alternative matching address according to the current search index of the best tagged address sequence and a preset ElasticSearch engine.
On the basis of the above embodiment, it is preferable to further include: a reference module and an optimization module, wherein:
the reference module is used for acquiring a reference matching address according to a preset address element of the optimal labeling address sequence and the preset ElasticSearch search engine;
the optimization module is used for obtaining the best matching address according to the confidence degree between the reference matching address and the alternative matching address.
On the basis of the foregoing embodiment, preferably, the sequence module includes a labeling unit, a training unit, a prediction unit, an update unit, and an iteration unit, and training data is obtained through the standard unit, the training unit, the update unit, and the iteration unit, where:
the labeling unit is used for acquiring a labeled address library and a preprocessed unlabeled address library in an original corpus, and the labeled address library is obtained by labeling according to a preset classification labeling system;
the training unit is used for training an initial CRF splitting model according to the labeled address library to obtain a target CRF splitting model;
the prediction unit is used for labeling part of unmarked addresses in the preprocessed unmarked address library according to the target CRF splitting model to obtain a labeled address sequence corresponding to the part of unmarked addresses;
the updating unit is used for updating the labeled address library by using the part of unlabeled addresses and the standard address sequence corresponding to the part of unlabeled addresses, and using the updated labeled address library as the labeled address library again, and using the target CRF splitting model as the initial CRF splitting model again;
the iteration unit is used for repeating the steps until the number of the residual unmarked addresses in the unmarked address base is smaller than a preset number threshold value, and taking the final addresses in the marked address base as training data.
On the basis of the foregoing embodiment, preferably, the updating unit includes a confidence unit and an updating subunit, where:
the confidence subunit is used for deleting the part of the un-labeled addresses with the confidence degrees larger than a preset confidence degree threshold from the un-labeled address library according to the confidence degrees between the part of the un-labeled addresses and the corresponding labeled address sequences;
and the updating subunit is used for adding the corresponding labeled address sequence into the labeled address library and acquiring the updated labeled address library.
On the basis of the foregoing embodiment, in the confidence subunit, preferably, the confidence between the part of unlabeled addresses and the corresponding labeled address sequence is obtained as follows:
Figure BDA0003328853510000161
wherein, CxRepresenting the confidence between the unmarked address corpus and the corresponding marked address sequence, i representing the current position, and X ═ X1,x2,…,xn) For no address, Y ═ Y1,y2,…,yn) Indicating the predicted tag address sequence, taking X as the input variable, and outputting the variableThe quantity Y is Y.
On the basis of the foregoing embodiment, preferably, the sequence module further includes a feature unit and a splitting unit, where:
the characteristic unit is used for acquiring a characteristic function according to the preset characteristic template;
and the splitting unit is used for extracting features of the training data according to the feature function, training a finally obtained target CRF splitting model by combining the weight of each feature, and obtaining the trained CRF splitting model.
On the basis of the foregoing embodiment, preferably, the optimization unit includes a first optimization subunit and a second optimization subunit, and for any alternative matching address, where:
the first optimization subunit is configured to, if the confidence degrees of the reference matching address and the any one of the candidate matching addresses are greater than a first preset matching threshold, match the cell information of the reference matching address and the cell information of the any one of the candidate matching addresses, and if both matching results are greater than a second preset matching threshold, use the any one of the candidate matching addresses as an optimal matching address;
the second optimization subunit is configured to, if the confidence degrees of the reference matching address and the any one of the candidate matching addresses are smaller than the first preset matching threshold, merge the route number information and the cell information of the reference matching address, merge the route number information and the cell information of the any one of the candidate matching addresses, and if the matching degree after merging of the route number information and the cell information of the any one of the candidate matching addresses is greater than the first preset matching threshold, regard the any one of the candidate matching addresses as the best matching address.
The various modules in the address matching system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
The present embodiment is a system embodiment corresponding to the method, the specific implementation process of the system embodiment is the same as the method embodiment, please refer to the method embodiment for details, and the system embodiment is not described herein again.
Fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present invention, where the computer device may be a server, and an internal structural diagram of the computer device may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a computer storage medium and an internal memory. The computer storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the computer storage media. The database of the computer device is used for storing data generated or acquired during the execution of the address matching method, such as a preprocessed target address, a trained CRF splitting model, training data, and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an address matching method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the address matching method in the above embodiments are implemented. Alternatively, the processor implements the functions of the modules/units in this embodiment of the address matching system when executing the computer program.
In an embodiment, a computer storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the address matching method in the above embodiments. Alternatively, the computer program realizes the functions of the modules/units in the embodiment of the address matching system described above when executed by the processor.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. An address matching method, comprising:
acquiring a preprocessed target address;
inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, wherein the trained CRF splitting model is obtained by training based on a preset characteristic template and training data;
and acquiring an alternative matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine.
2. The address matching method according to claim 1, further comprising:
acquiring a reference matching address according to a preset address element of the optimal labeling address sequence and the preset ElasticSearch search engine;
and acquiring the best matching address according to the confidence degree between the reference matching address and the alternative matching address.
3. The address matching method according to claim 1 or 2, wherein the training data is obtained by:
s211, acquiring a labeled address library and a preprocessed unlabeled address library in an original corpus, wherein the labeled address library is obtained by labeling according to a preset classification labeling system;
s212, training the initial CRF splitting model according to the labeled address library to obtain a target CRF splitting model;
s213, according to the target CRF splitting model, marking part of unmarked addresses in the preprocessed unmarked address library to obtain a marked address sequence corresponding to the part of unmarked addresses;
s214, updating the labeled address library by using the part of unlabeled addresses and the labeled address sequence corresponding to the part of unlabeled addresses, using the updated labeled address library as the labeled address library again, and using the target CRF splitting model as the initial CRF splitting model again;
s215, repeating the steps S212 to S214 until the number of the remaining unmarked addresses in the unmarked address base is less than a preset number threshold, and using the addresses in the marked address base as training data.
4. The address matching method of claim 3, wherein the updating the labeled address library by using the partial unlabeled address and the standard address sequence corresponding to the partial unlabeled address comprises:
deleting the part of the un-labeled addresses with the confidence degrees larger than a preset confidence degree threshold value from the un-labeled address library according to the confidence degrees between the part of the un-labeled addresses and the corresponding labeled address sequences;
and adding the corresponding labeled address sequence into the labeled address library to obtain an updated labeled address library.
5. The address matching method of claim 4, wherein the confidence between the partially unlabeled address and the corresponding labeled address sequence is obtained by:
Figure FDA0003328853500000021
wherein, CxRepresenting the confidence between the unmarked address corpus and the corresponding marked address sequence, i representing the current position, and X ═ X1,x2,…,xn) For no address, Y ═ Y1,y2,…,yn) And representing the predicted labeled address sequence, wherein an input variable X is X, and an output variable Y is Y.
6. The address matching method according to claim 3, wherein the trained CRF split model is obtained by training based on a preset feature template and training data, and is obtained by the following steps:
acquiring a characteristic function according to the preset characteristic template;
and extracting features of the training data according to the feature function, training the initial CRF splitting model by combining the weight of each feature, and obtaining the trained CRF splitting model.
7. The address matching method according to claim 2, wherein the obtaining a best matching address according to the confidence between the reference matching address and the candidate matching address comprises:
for any alternative matching address, if the confidence degrees of the reference matching address and the any alternative matching address are greater than a first preset matching threshold, respectively matching the cell information of the reference matching address and the cell information of the any alternative matching address, the path number information of the reference matching address and the path number information of the any alternative matching address, and taking the any alternative matching address as an optimal matching address if the cell matching result and the path number matching result are both greater than a second preset matching threshold;
if the confidence degrees of the reference matching address and any one of the alternative matching addresses are smaller than the first preset matching threshold, combining the route number information and the cell information of the reference matching address, combining the route number information and the cell information of any one of the alternative matching addresses, and if the degree of matching after the combination of the route number information and the cell information is larger than the first preset matching threshold, taking any one of the alternative matching addresses as the best matching address.
8. An address matching system, comprising:
the acquisition module is used for acquiring the preprocessed target address;
the sequence module is used for inputting the preprocessed target address into a trained CRF splitting model to obtain an optimal labeling address sequence, and the trained CRF splitting model is obtained by training based on a preset characteristic template and training data;
and the matching module is used for acquiring the alternative matching address according to the current search index of the optimal labeling address sequence and a preset ElasticSearch search engine.
9. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the address matching method according to any of claims 1 to 7 when executing the computer program.
10. A computer storage medium storing a computer program, the computer program implementing the steps of the address matching method according to any one of claims 1 to 7 when executed by a processor.
CN202111274139.1A 2021-10-29 2021-10-29 Address matching method, system, device and storage medium Pending CN114003812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111274139.1A CN114003812A (en) 2021-10-29 2021-10-29 Address matching method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111274139.1A CN114003812A (en) 2021-10-29 2021-10-29 Address matching method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN114003812A true CN114003812A (en) 2022-02-01

Family

ID=79925540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111274139.1A Pending CN114003812A (en) 2021-10-29 2021-10-29 Address matching method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114003812A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304234A (en) * 2023-01-10 2023-06-23 奉加微电子(上海)有限公司 Data matching method, processing method, device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110265151A1 (en) * 2010-04-22 2011-10-27 John Furlan Method of adding a client device or service to a wireless network
US20120059853A1 (en) * 2010-01-18 2012-03-08 Salesforce.Com, Inc. System and method of learning-based matching
KR101132150B1 (en) * 2010-10-12 2012-07-11 (주)수지원넷소프트 Address processing for formalizing addresses
US20190228082A1 (en) * 2018-01-22 2019-07-25 Mapquest, Inc. Location query processing and scoring
US10437718B1 (en) * 2018-04-27 2019-10-08 International Business Machines Corporation Computerized methods for prefetching data based on machine learned sequences of memory addresses
CN110334162A (en) * 2019-05-09 2019-10-15 德邦物流股份有限公司 Address Recognition method and device
CN111460054A (en) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 Address data processing method and device, equipment and storage medium
CN112364114A (en) * 2020-11-16 2021-02-12 深圳壹账通智能科技有限公司 Address standardization method and device, computer equipment and storage medium
CN112527933A (en) * 2020-12-04 2021-03-19 重庆市地理信息和遥感应用中心 Chinese address association method based on space position and text training
CN113360788A (en) * 2021-05-07 2021-09-07 深圳依时货拉拉科技有限公司 Address recommendation method, device, equipment and storage medium
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN113515687A (en) * 2020-04-09 2021-10-19 北京京东振世信息技术有限公司 Logistics information acquisition method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120059853A1 (en) * 2010-01-18 2012-03-08 Salesforce.Com, Inc. System and method of learning-based matching
US20110265151A1 (en) * 2010-04-22 2011-10-27 John Furlan Method of adding a client device or service to a wireless network
KR101132150B1 (en) * 2010-10-12 2012-07-11 (주)수지원넷소프트 Address processing for formalizing addresses
US20190228082A1 (en) * 2018-01-22 2019-07-25 Mapquest, Inc. Location query processing and scoring
US10437718B1 (en) * 2018-04-27 2019-10-08 International Business Machines Corporation Computerized methods for prefetching data based on machine learned sequences of memory addresses
CN111460054A (en) * 2019-01-21 2020-07-28 阿里巴巴集团控股有限公司 Address data processing method and device, equipment and storage medium
CN110334162A (en) * 2019-05-09 2019-10-15 德邦物流股份有限公司 Address Recognition method and device
CN113515687A (en) * 2020-04-09 2021-10-19 北京京东振世信息技术有限公司 Logistics information acquisition method and device
CN112364114A (en) * 2020-11-16 2021-02-12 深圳壹账通智能科技有限公司 Address standardization method and device, computer equipment and storage medium
CN112527933A (en) * 2020-12-04 2021-03-19 重庆市地理信息和遥感应用中心 Chinese address association method based on space position and text training
CN113360788A (en) * 2021-05-07 2021-09-07 深圳依时货拉拉科技有限公司 Address recommendation method, device, equipment and storage medium
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304234A (en) * 2023-01-10 2023-06-23 奉加微电子(上海)有限公司 Data matching method, processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111783419B (en) Address similarity calculation method, device, equipment and storage medium
CN113239210B (en) Water conservancy literature recommendation method and system based on automatic completion knowledge graph
CN106909611B (en) Hotel automatic matching method based on text information extraction
CN108388559A (en) Name entity recognition method and system, computer program of the geographical space under
CN109933797A (en) Geocoding and system based on Jieba participle and address dictionary
CN111291099B (en) Address fuzzy matching method and system and computer equipment
CN112069276A (en) Address coding method and device, computer equipment and computer readable storage medium
CN112527933A (en) Chinese address association method based on space position and text training
CN112528174A (en) Address finishing and complementing method based on knowledge graph and multiple matching and application
CN116414823A (en) Address positioning method and device based on word segmentation model
CN114091454A (en) Method for extracting place name information and positioning space in internet text
CN115630648A (en) Address element analysis method and system for man-machine conversation and computer readable medium
CN114003812A (en) Address matching method, system, device and storage medium
CN112069824B (en) Region identification method, device and medium based on context probability and citation
CN116701734A (en) Address text processing method and device and computer readable storage medium
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN115292962B (en) Path similarity matching method and device based on track rarefaction and storage medium
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN113515677B (en) Address matching method, device and computer readable storage medium
CN111680122B (en) Space data active recommendation method and device, storage medium and computer equipment
CN116431625A (en) Positioning analysis method and device for geographic entity and computer equipment
CN116414808A (en) Method, device, computer equipment and storage medium for normalizing detailed address
CN114036414A (en) Method and device for processing interest points, electronic equipment, medium and program product
CN113536781A (en) Address identification method and device, readable storage medium and terminal
CN111209392B (en) Method, device and equipment for excavating polluted enterprises

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination