CN115577065B - Address resolution method and device - Google Patents

Address resolution method and device Download PDF

Info

Publication number
CN115577065B
CN115577065B CN202211578045.8A CN202211578045A CN115577065B CN 115577065 B CN115577065 B CN 115577065B CN 202211578045 A CN202211578045 A CN 202211578045A CN 115577065 B CN115577065 B CN 115577065B
Authority
CN
China
Prior art keywords
information
vector
company
word
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211578045.8A
Other languages
Chinese (zh)
Other versions
CN115577065A (en
Inventor
陈成
梁文杰
徐崚峰
刘殿兴
岳丰
温晓聪
彭美玲
晏超
方兴
宋群力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Citic Securities Co ltd
Original Assignee
Citic Securities Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Citic Securities Co ltd filed Critical Citic Securities Co ltd
Priority to CN202211578045.8A priority Critical patent/CN115577065B/en
Publication of CN115577065A publication Critical patent/CN115577065A/en
Application granted granted Critical
Publication of CN115577065B publication Critical patent/CN115577065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The utility model discloses a method and a device for address resolution, which can remove irrelevant information by dividing business information into three parts to obtain company information and position words, respectively combine the company information and the position words with models to obtain company feature vectors and geographic information vectors, and finally analyze the company feature vectors and the geographic information vectors to obtain provinces and cities to which business parts corresponding to the business part information belong, so that even if the business part information is incomplete or address names are easily confused, the provinces and cities to which the corresponding business parts belong can be analyzed by the method, and the accuracy of address resolution is improved.

Description

Address resolution method and device
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for address resolution.
Background
In order to control the risk of the financial market, the business part of the abnormal transaction can be mastered according to the data in the financial scene, the transaction is positioned to a specific ground market by using an address resolution method, the global risk is converted into regional risk, and the association relation between risk points is better searched, so that the risk control is realized. In addition, the method for address resolution can accurately push related transaction information to financial staff according to regions, so that potential business opportunities are mined.
At present, the scheme for address resolution mainly adopts a map tool interface and a related address character string resolution tool package, but in practical application, the obtained business part information has the problems of partial information loss, easy confusion of address names and the like, so that a higher error recognition rate is caused.
Disclosure of Invention
The address resolution method and the address resolution device can be used for carrying out less character string matching processing through the address resolution method based on the neural network, and address resolution is carried out through multi-information combination, so that the probability of error resolution is reduced, and the aim of improving the correct recognition rate is fulfilled.
In a first aspect, the present application provides a method of address resolution, the method comprising:
obtaining a company feature vector according to company information, wherein the company information is information representing a company to which a business unit belongs in business unit information, and the company feature vector is used for representing a mechanism position of the company corresponding to the company information;
obtaining a geographic information vector corresponding to a position word, wherein the position word is a word representing administrative division information of a business part determined according to the business part information, and the geographic information vector is used for representing a position corresponding to the position word;
and carrying out hierarchical classification according to the company feature vector and the geographic information vector to obtain an analysis result, wherein the analysis result indicates province and city to which the business department belongs.
Optionally, performing hierarchical classification according to the company feature vector and the geographic information vector to obtain an analysis result, including:
combining the company feature vector and the geographic information vector to obtain a coding vector;
and carrying out hierarchical classification according to the coding vector to obtain an analysis result.
Optionally, obtaining the geographic information vector corresponding to the location word includes:
carrying out word vector representation on the position words to obtain position word vectors;
and obtaining a geographic information vector according to the position word vector.
Optionally, after the word vector representation is performed on the position word to obtain the position word vector, before the geographic information vector is obtained according to the position word vector, the method further includes:
position weighting is carried out according to the position word vector, and a weighted position word vector is obtained;
obtaining a geographic information vector according to the position word vector, including:
and obtaining a geographic information vector according to the weighted position word vector.
Optionally, obtaining the geographic information vector according to the position word vector includes:
and extracting information of the position word vector according to the neural network model to obtain a geographic information vector.
Optionally, the method further comprises:
removing relevant fields of business units from business unit information to obtain address information;
and obtaining company information and position information according to the address information, wherein the position words are obtained by cutting the position information according to a corpus, and the corpus comprises an administrative division corpus.
In a second aspect, the present application provides an apparatus for address resolution, the apparatus comprising:
the first obtaining unit is used for obtaining a company feature vector according to company information, wherein the company information is information which represents a company to which a business part belongs in the business part information, and the company feature vector is used for representing a mechanism position of the company corresponding to the company information;
the second obtaining unit is used for obtaining a geographic information vector corresponding to a position word, wherein the position word is a word which is determined according to the business part information and represents administrative division information of the business part, and the geographic information vector is used for representing a position corresponding to the position word;
and the first processing unit is used for carrying out hierarchical classification according to the company characteristic vector and the geographic information vector to obtain an analysis result, wherein the analysis result indicates province and city to which the business department belongs.
Optionally, the first obtaining unit is specifically configured to:
combining the company feature vector and the geographic information vector to obtain a coding vector;
and carrying out hierarchical classification according to the coding vector to obtain an analysis result.
Optionally, the second obtaining unit is specifically configured to:
carrying out word vector representation on the position words to obtain position word vectors;
and obtaining a geographic information vector according to the position word vector.
Optionally, the apparatus further comprises:
the second processing unit is used for carrying out position weighting according to the position word vector to obtain a weighted position word vector;
the second obtaining unit is specifically configured to:
and obtaining a geographic information vector according to the weighted position word vector.
Optionally, the second obtaining unit is specifically configured to:
and extracting information of the position word vector according to the neural network model to obtain a geographic information vector.
Optionally, the apparatus further comprises:
the third obtaining unit is used for removing relevant fields of the business part from the business part information to obtain address information;
the fourth obtaining unit is used for obtaining company information and position information according to the address information, the position words are obtained by cutting words according to a corpus, and the corpus comprises an administrative division corpus.
In a third aspect, the present application provides an apparatus for address resolution, the apparatus comprising a memory and a processor:
the memory is used for storing a computer program;
the processor is configured to perform the method provided in the first aspect above according to a computer program.
In a fourth aspect, the present application also provides a computer readable storage medium for storing a computer program for performing the method provided in the first aspect.
From this, this application has following beneficial effect:
the company feature vector is obtained through company information in business office information, the geographic feature vector is obtained through position words obtained through the business office information, and the analysis result is obtained according to the company feature vector and the geographic feature vector. The utility model provides an address analysis method, through dividing business information into three parts, irrelevant information (such as relevant fields of business parts) can be removed, company information and position words are obtained, company characteristic vectors and geographic information vectors are obtained after the company information and the position words are respectively combined with model processing, finally, company characteristic vectors and geographic information vectors are combined to analyze and obtain provinces and cities to which business parts corresponding to the business part information belong, even if the business part information is incomplete or address names are easy to be confused, the provinces and cities to which the corresponding business parts belong can be analyzed through the method, so that the accuracy of address analysis is improved, and the aim of improving the correct recognition rate is fulfilled.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following description will briefly explain the drawings needed in the description of the embodiments, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings by those skilled in the art.
FIG. 1 is a flow chart of a method for address resolution according to an embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of a method for address resolution according to the present disclosure;
fig. 3 is a schematic structural diagram of an address resolution apparatus 300 according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an address resolution apparatus 400 according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The "first" in the names of the "first obtaining unit", "first processing unit", and the like in the embodiments of the present application is used for name identification, and does not represent the first in sequence. The rule applies equally to "second", "third", etc.
At present, the technical scheme related to address resolution mainly adopts a map tool interface and a related address character string resolution tool package to resolve character strings with address meanings into specific provinces and cities. However, in the operation of finding the region to which the business unit belongs, it is found that there are a lot of errors in the positioning information obtained by analyzing the address information transmitted from the map tool interface, for example, the address of the business unit in the non-Beijing area may be positioned to Beijing city, and the place city and road name contained in the business unit information still cannot be confirmed in the map tool; the related address analysis tools are found to obtain provincial and municipal information by adopting a method of firstly word segmentation and then matching, the method depends on the accuracy of word segmentation and the integrity of a provincial and municipal county information dictionary, and is easy to be influenced by a word segmentation model training corpus, such as a 'Xingxia Chao road business part', the address is easy to be erroneously resolved as a 'Ningxia autonomous region' after word segmentation, but the actual correct address is 'Qinghai Xingjingying city', and the provincial and municipal area name which is not specially processed has obvious errors.
In the embodiment of the application, the company feature vector is obtained through company information in business department information, the geographic feature vector is obtained through position words obtained through the business department information, and the analysis result is obtained according to the company feature vector and the geographic feature vector. In particular, the method may include, for example: and removing irrelevant information from the obtained business department information, dividing the rest part into company information and position information, obtaining a position word vector according to the position information, obtaining a geographic position vector according to the position word vector, and finally merging the company feature vector and the geographic position vector to obtain a coding vector and obtaining a coding result according to the coding vector. Therefore, the method provided by the implementation of the application can analyze the provinces and cities to which the corresponding business parts belong through the business part information, so that the accuracy of address analysis is improved, and the aim of improving the correct recognition rate is fulfilled.
In order to facilitate understanding of the specific implementation of the address resolution method provided in the embodiments of the present application, the following description will be made with reference to the accompanying drawings.
It should be noted that, the main body of the method for implementing address resolution may be an address resolution apparatus provided in the embodiments of the present application, and the address resolution apparatus may be carried in an electronic device or a functional module of the electronic device. The electronic device in the embodiment of the present application may be any device capable of implementing the method of address resolution in the embodiment of the present application, for example, may be an internet of things (Internet of Things, ioT) device.
Fig. 1 is a flow chart of a method for address resolution according to an embodiment of the present application. The method may be applied to an address resolution device, which may be, for example, the address resolution device 300 shown in fig. 3, or may be a functional module integrated in the address resolution device 400 shown in fig. 4.
As shown in fig. 1, the method comprises the steps of:
s101: and obtaining a company feature vector according to company information, wherein the company information is information representing a company to which a business unit belongs in business unit information, and the company feature vector is used for representing a mechanism position of a company corresponding to the company information.
The business department information is firstly obtained by carrying out regional attribution analysis on the name of the business department of the transaction department, wherein the business department information comprises fields related to the business department and address information indicating the place of the business department, and company information and position information are obtained according to the address information.
In order to obtain the province and city where the business department is located in the business department information, an operation of performing address resolution on the business department information is required to obtain the information of the province and city where the business department is located. Firstly, obtaining a company feature vector according to company information in business department information; obtaining position words according to address information in business department information, and obtaining corresponding geographic information vectors according to the position words; and finally, carrying out hierarchical classification according to the company feature vector and the geographic information vector to obtain an analysis result indicating the province and the city where the business department is located. Therefore, the embodiment of the present application provides a precondition for obtaining the analysis result by obtaining the company feature vector through the embedded learning operation through S101.
As one example, S101 may include: the company information is learned into a matrix with a parameter of the position information of the securities company, and a company feature vector representing the organization position of the company corresponding to the company information can be obtained.
S102: and obtaining a geographic information vector corresponding to a position word, wherein the position word is a word representing administrative division information of the business unit determined according to the business unit information, and the geographic information vector is used for representing a position corresponding to the position word.
As one example, S102 may include: s1021, the location information may be divided into at least one part of the relevant administrative division words by the administrative division corpus prepared in advance, and the location words may be obtained. S1022, through a pre-trained word embedding model, each position word can obtain a corresponding position word vector. S1023, extracting information of the position word vector according to the neural network model, and obtaining a geographic information vector.
Wherein, between S1022 and S1023 may further include: and carrying out position weighting on the obtained position word vector to obtain a weighted position word vector. The weighted position word vectors can make the importance degree of each word different, and the confusion information generated by the filtering words can make the analysis result obtained later more accurate.
S103: and carrying out hierarchical classification according to the company feature vector and the geographic information vector to obtain an analysis result, wherein the analysis result indicates province and city to which the business department belongs.
As one example, S103 may include: s1031, combining the company characteristic vector and the geographic information vector to obtain a coding vector. S1032, carrying out hierarchical classification on the coding vectors to obtain a classification result representing the first-level unit of province and a classification result representing the local market in the corresponding province. S1033, combining the two classification results to obtain analysis results indicating provinces and cities where the business department is located.
The analysis result obtained in S1033 may be a vector sequence composed of provincial and urban labels, where the provincial and urban labels are related to each other, and the classification result of the local city is determined based on the classification result of the provincial and urban primary unit, so that the accuracy of address analysis is further improved.
It can be seen that, in the method of the embodiment of the present application, a company feature vector is obtained according to company information in information of a business unit, where the company feature vector includes geographic information of the business unit to which the company belongs, then a location word corresponding to address information in the information of the business unit is obtained, then a corresponding geographic information vector is obtained according to the location word, and finally, the company feature vector and the geographic information vector are combined to obtain an analysis result related to province and city to which the business unit belongs. And compared with pure address information, the address resolution combining the two information has richer business department distribution information of the stock company, and improves the resolution effect, thereby achieving the aim of improving the correct recognition rate.
In order to make the method provided by the embodiments of the present application clearer and easier to understand, a specific example of the method is described below with reference to fig. 2.
S201: business information is divided into company information and position information, wherein the company information is information indicating a company to which a business belongs in the business information, and the position information is information indicating a geographic position in the business information.
As one example, S201 may include: acquiring business information, wherein the business information is the name of a securities business; removing relevant fields of business units from business unit information, wherein the rest field information comprises company information and position information; dividing the remaining field information into two parts of company information and location information, and respectively denoted as C i And X i For example, "a stock share limited company Shanghai century large securities business department", a business department related field of "business department" is removed, and the remaining contents are divided into "a stock share limited company" which is company information and "Shanghai century large securities" which is position information.
S202: and obtaining a company characteristic vector according to the company information, wherein the company characteristic vector is used for representing the organization position of a company corresponding to the company information.
As one example, S202 may include: c is based on the parameter matrix i Performing word embedding process to obtain company feature vector CE i Wherein the parameter matrix may be a neural network model obtained by training the company feature of the stock company, and the information of each column in the parameter matrix represents the company feature vector of a stock company, for example, the process of obtaining the company feature vector may be as follows, and C is confirmed first i A number corresponding to the company information, and confirming the company feature vector CE according to the number i For the vector of the number column number in the parameter matrix, the company information is characterized as a company feature vector.
Figure GDA0004220834490000081
For example, the company feature vector of the stock a limited company is represented as [0.5,0.4, … …,0.3], where the company information corresponds to the 7 th column vector of the parameter matrix, for example, the vector described in the black box in the above-mentioned parameter matrix.
S203: and obtaining at least one position word according to the position information.
As one example, S203 may include: through a preset administrative division corpus, the position information X is obtained i Is divided into at least one position word. Wherein the administrative division corpus comprises information of administrative divisions. For example, "Jiangsu Suzhou Wu Jiangzhong mountain southbound" can be divided into four positional words of "Jiangsu", "Suzhou", "Wu Jiang" and "Zhongshan".
S204: and obtaining at least one position word vector according to the position word.
As one example, S204 may include: word2Vec is used for representing words into corresponding Word vectors, and a Word embedding model W (-) trained in advance is used for obtaining a Word vector e representing the corresponding position j . For example, the above four position words are subjected to word vector characterization to obtain a position word vector { e } j The following is true:
w (Jiangsu, su, wu Jiang, zhongshan) =e 1 ,e 2 ,e 3 ,e 4
Wherein e 1 Position word vector, e, representing "Jiangsu" position words 2 Position word vector representing "Suzhou" position word, e 3 Position word vector representing "Wu Jiang" position word, e 4 A position word vector representing a "Zhongshan" position word.
S205: and carrying out position weighting according to the position word vector to obtain a weighted position word vector.
As one example, S205 may include: and carrying out position weighting on the position word vector, for example, weighting the position word vector by adopting a preset index weight to obtain a weighted position word vector. For example, the positional word vector { e } obtained above j "Zhongshan" in the } medium may cause ambiguity, thus requiring a sense of "e j Weighting the materials, wherein the preset weight is [0.5,0.25,0.125,0.0625 ]]The obtained weighted position word vector { e } j 0.5.e 1 ,0.25·e 2 ,,0.125·e 3 ,0.0625·e 4 . To simplify the symbols, the following textStill use { e } j And the position word vector subjected to the position weighting processing is represented.
According to experience or actual requirements, different weights are set for the position words, the weights of the position words are given to the position word vectors corresponding to the position words, and confusion information of the position words is filtered through weighting the position word vectors, so that accuracy of obtaining the geographic information vectors later is improved.
S206: and obtaining a geographic position vector according to the weighted position word vector, wherein the geographic information vector is used for representing the position corresponding to the position word.
As one example, S206 may include: and extracting the information of the weighted position word vector by using a neural network model to obtain a geographic position vector. For example, the above obtained position word vector { e } is mapped using a long-short-term memory network (Long Short Term Memory, LSTM) j Information extraction is carried out to obtain a geographic information vector EE i Wherein the LSTM network is a neural network based on common cyclic neural network transformation and is suitable for data of a sequence type.
In the embodiment of the present application, the execution order of S202 and S203 to S206 is not limited, and S202 may be executed first and then S203 to S206 may be executed, S203 to S206 may be executed first and then S202 may be executed, or S202 and S203 to S206 may be executed simultaneously.
S207: and combining the company characteristic vector and the geographic position vector to obtain a coding vector.
As one example, S207 may include: splicing the obtained CEs i Vector sum EE i Vector, obtaining final encoded vector XE i For example, XE i The method can be obtained by splicing the following formulas:
Figure GDA0004220834490000091
wherein CE is i The vector is a company feature vector, EE i The vector is a geographic location vector.
S208: and carrying out hierarchical classification according to the coding vector to obtain a coding result.
As one example, S208 may include: the encoded vectors are hierarchically classified using a normalized exponential (Softmax) function, where the formula is as follows:
h 1 =W 1 ·XE i
y 1 =Softmax(h 1 )
Figure GDA0004220834490000092
y 2 =Softmax(h 2 )
wherein XE is i For coding vectors, W 1 To XE i Parameter matrix for linear transformation, h 1 Is W 1 For XE i Matrix obtained after linear transformation, h 1 Obtaining y through Softmax function processing 1 ,y 1 A vector representing the province information; w (W) 2 To pair h 1 And XE i A parameter matrix of linear transformation is carried out on the spliced matrix, and h is a parameter matrix 2 Obtaining y through Softmax function processing 2 ,y 2 To represent a vector of ground level market information in the province.
Respectively obtaining classification results y of provincial unit 1 And the classification result y of the ground city in the corresponding province 2 The two components together form an analysis result, which represents the province and city to which the business department belongs in the business department information.
The training of the address resolution model is performed by using hierarchical classification, so that the model dimension can be reduced, and the accuracy of the model can be improved.
The embodiment of the application provides an address resolution method, which obtains company information and position information after obtaining business department information for division; obtaining a company feature vector according to company information; then, a position word vector is obtained according to the position information, and the position word vector is subjected to position weighting to obtain a weighted position word vector; obtaining a geographic position vector according to the weighted position word vector; and finally, merging the company characteristic vector and the geographic position vector to obtain a coding vector, and obtaining a coding result according to the coding vector.
According to the embodiment of the application, the final analysis basis is obtained through the company feature vector and the geographic information vector, the address analysis is carried out by using multiple information, the address analysis result with high accuracy can be obtained, the accuracy of the address analysis is improved, the predictive label dimension is effectively reduced by using the sequence-to-sequence (Sequence to Sequence, seq2 Seq) model, the overall simplicity of the model is ensured, and the operation is convenient.
Referring to fig. 3, an embodiment of the present application provides an apparatus 300 for address resolution, where the apparatus 300 includes:
a first obtaining unit 301, configured to obtain a company feature vector according to company information, where the company information is information indicating a company to which a business unit belongs in the business unit information, and the company feature vector is used to characterize a mechanism location of a company corresponding to the company information;
a second obtaining unit 302, configured to obtain a geographic information vector corresponding to a location word, where the location word is a word that represents administrative division information of the business unit and is determined according to the business unit information, and the geographic information vector is used to represent a location corresponding to the location word;
the first processing unit 303 is configured to perform hierarchical classification according to the company feature vector and the geographic information vector, so as to obtain an analysis result, where the analysis result indicates a province and a city to which the business department belongs.
Alternatively, the first obtaining unit 301 is specifically configured to:
combining the company feature vector and the geographic information vector to obtain a coding vector;
and carrying out hierarchical classification according to the coding vector to obtain an analysis result.
Optionally, the second obtaining unit 302 is specifically configured to:
carrying out word vector representation on the position words to obtain position word vectors;
and obtaining a geographic information vector according to the position word vector.
Optionally, the apparatus 300 further comprises:
the second processing unit is used for carrying out position weighting according to the position word vector to obtain a weighted position word vector;
the second obtaining unit is specifically configured to:
and obtaining a geographic information vector according to the weighted position word vector.
Optionally, the second obtaining unit 302 is specifically configured to:
and extracting information of the position word vector according to the neural network model to obtain a geographic information vector.
Optionally, the apparatus 300 further comprises:
the third obtaining unit is used for removing relevant fields of the business part from the business part information to obtain address information;
the fourth obtaining unit is used for obtaining company information and position information according to the address information, the position words are obtained by cutting words according to a corpus, and the corpus comprises an administrative division corpus.
The embodiment of the application further provides an address resolution device 400, as shown in fig. 4, where the device 400 includes a memory 401 and a processor 402:
the memory 401 is for storing a computer program;
the processor 402 is configured to perform the methods provided in fig. 1 or fig. 2 described above in accordance with a computer program.
From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus general hardware platforms. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, including several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a router) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the objective of the embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application.

Claims (6)

1. A method of address resolution, comprising:
acquiring business information;
removing a field containing business units from the business unit information, and determining address information;
determining company information and position information according to the address information;
confirming a number corresponding to the company information; determining a vector of a column of the number in the parameter matrix according to the number; taking the vector as a company characteristic vector; the parameter matrix is a neural network model obtained by training the company characteristics of an input company; the information of each column in the parameter matrix comprises a company characteristic vector of a company;
dividing the position information into at least one position word according to a preset administrative area corpus; the administrative district corpus comprises administrative district information;
converting the position Word into a Word vector according to Word2 Vec; inputting the word vector into a pre-trained word embedding model, and determining a position word vector;
weighting the position word vector according to a preset index weight to obtain a weighted position word vector;
extracting the information of the weighted position word vector according to a neural network model, and determining a geographic position vector;
splicing the company feature vector and the geographic position vector to determine a coding vector;
according to the formula
Figure QLYQS_17
、/>
Figure QLYQS_3
、/>
Figure QLYQS_14
And +.>
Figure QLYQS_8
Determining a coding result; wherein (1)>
Figure QLYQS_13
For the coding vector, < >>
Figure QLYQS_18
For->
Figure QLYQS_19
Parameter matrix for linear transformation, ++>
Figure QLYQS_5
Is->
Figure QLYQS_16
For->
Figure QLYQS_1
Matrix obtained after linear transformation, +.>
Figure QLYQS_10
Obtaining +.about.f. by Softmax function processing>
Figure QLYQS_7
,/>
Figure QLYQS_9
A vector representing the province information; />
Figure QLYQS_4
For->
Figure QLYQS_12
And->
Figure QLYQS_6
Parameter matrix for linear transformation of spliced matrix, < >>
Figure QLYQS_15
Obtaining +.about.f. by Softmax function processing>
Figure QLYQS_2
,/>
Figure QLYQS_11
To represent a vector of the ground city information in the province included in the province information.
2. The method of claim 1, wherein the extracting information of the weighted position word vector according to the neural network model, determining a geographic position vector, comprises:
and extracting the information of the weighted position word vector according to a long-term and short-term memory network, and determining the geographic position vector.
3. An apparatus for address resolution, comprising:
the first obtaining unit is used for obtaining business department information;
a third obtaining unit, configured to remove a field containing a business unit from the business unit information, and determine address information;
a fourth obtaining unit configured to determine company information and location information according to the address information;
the first obtaining unit is further used for confirming a number corresponding to the company information; determining a vector of a column of the number in the parameter matrix according to the number; taking the vector as a company characteristic vector; the parameter matrix is a neural network model obtained by training the company characteristics of an input company; the information of each column in the parameter matrix comprises a company characteristic vector of a company;
the fourth obtaining unit is further configured to segment the location information into at least one location word according to a preset administrative area corpus; the administrative district corpus comprises administrative district information;
the second obtaining unit is used for converting the position Word into a Word vector according to Word2 Vec; inputting the word vector into a pre-trained word embedding model, and determining a position word vector;
the second processing unit is used for weighting the position word vector according to a preset index weight to obtain a weighted position word vector;
the second obtaining unit is further used for extracting the information of the weighted position word vector according to a neural network model and determining a geographic position vector;
the first processing unit is used for splicing the company feature vector and the geographic position vector and determining a coding vector; according to the formula
Figure QLYQS_27
、/>
Figure QLYQS_22
、/>
Figure QLYQS_28
And +.>
Figure QLYQS_21
Determining a coding result; which is a kind ofIn (I)>
Figure QLYQS_32
For the coding vector, < >>
Figure QLYQS_30
For->
Figure QLYQS_38
Parameter matrix for linear transformation, ++>
Figure QLYQS_25
Is->
Figure QLYQS_31
For->
Figure QLYQS_20
Matrix obtained after linear transformation, +.>
Figure QLYQS_33
Obtaining +.about.f. by Softmax function processing>
Figure QLYQS_24
,/>
Figure QLYQS_36
A vector representing the province information; />
Figure QLYQS_26
For->
Figure QLYQS_34
And->
Figure QLYQS_23
Parameter matrix for linear transformation of spliced matrix, < >>
Figure QLYQS_29
Obtaining +.about.f. by Softmax function processing>
Figure QLYQS_35
,/>
Figure QLYQS_37
To represent a vector of the ground city information in the province included in the province information.
4. A device according to claim 3, characterized in that the second obtaining unit is specifically configured to:
and extracting the information of the weighted position word vector according to a long-term and short-term memory network, and determining the geographic position vector.
5. An apparatus for address resolution, characterized in that the apparatus comprises a memory and a processor for executing a program stored in the memory, running the method according to any of claims 1-2.
6. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-2.
CN202211578045.8A 2022-12-09 2022-12-09 Address resolution method and device Active CN115577065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211578045.8A CN115577065B (en) 2022-12-09 2022-12-09 Address resolution method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211578045.8A CN115577065B (en) 2022-12-09 2022-12-09 Address resolution method and device

Publications (2)

Publication Number Publication Date
CN115577065A CN115577065A (en) 2023-01-06
CN115577065B true CN115577065B (en) 2023-06-09

Family

ID=84590031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211578045.8A Active CN115577065B (en) 2022-12-09 2022-12-09 Address resolution method and device

Country Status (1)

Country Link
CN (1) CN115577065B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8965693B2 (en) * 2012-06-05 2015-02-24 Apple Inc. Geocoded data detection and user interfaces for same
CN110569322A (en) * 2019-07-26 2019-12-13 苏宁云计算有限公司 Address information analysis method, device and system and data acquisition method
CN112257413B (en) * 2020-10-30 2022-05-17 深圳壹账通智能科技有限公司 Address parameter processing method and related equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033086A (en) * 2018-08-03 2018-12-18 银联数据服务有限公司 A kind of address resolution, matched method and device

Also Published As

Publication number Publication date
CN115577065A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN110298042A (en) Based on Bilstm-crf and knowledge mapping video display entity recognition method
CN112528034B (en) Knowledge distillation-based entity relationship extraction method
CN104573028A (en) Intelligent question-answer implementing method and system
CN108304373B (en) Semantic dictionary construction method and device, storage medium and electronic device
WO2021189977A1 (en) Address coding method and apparatus, and computer device and computer-readable storage medium
CN111222336B (en) Method and device for identifying unknown entity
CN116432655B (en) Method and device for identifying named entities with few samples based on language knowledge learning
CN114153978A (en) Model training method, information extraction method, device, equipment and storage medium
CN114443855A (en) Knowledge graph cross-language alignment method based on graph representation learning
CN115470307A (en) Address matching method and device
CN116029394B (en) Self-adaptive text emotion recognition model training method, electronic equipment and storage medium
CN116136955B (en) Text transcription method, text transcription device, electronic equipment and storage medium
CN115577065B (en) Address resolution method and device
CN116562295A (en) Method for identifying enhanced semantic named entity for text in bridge field
CN116501834A (en) Address information processing method and device, mobile terminal and storage medium
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN110287396A (en) Text matching technique and device
CN115952800A (en) Named entity recognition method and device, computer equipment and readable storage medium
CN114792091A (en) Chinese address element analysis method and equipment based on vocabulary enhancement and storage medium
CN110609874B (en) Address entity coreference resolution method based on density clustering algorithm
CN114064269A (en) Address matching method and device and terminal equipment
CN113127607A (en) Text data labeling method and device, electronic equipment and readable storage medium
CN111767476A (en) HMM model-based smart city space-time big data spatialization engine construction method
CN111400606A (en) Multi-label classification method based on global and local information extraction
CN114943234B (en) Enterprise name linking method, enterprise name linking device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant