CN117708262A - Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment - Google Patents

Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment Download PDF

Info

Publication number
CN117708262A
CN117708262A CN202410150164.6A CN202410150164A CN117708262A CN 117708262 A CN117708262 A CN 117708262A CN 202410150164 A CN202410150164 A CN 202410150164A CN 117708262 A CN117708262 A CN 117708262A
Authority
CN
China
Prior art keywords
data information
matched
standard
association
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410150164.6A
Other languages
Chinese (zh)
Inventor
田越
汪跃辉
姚宏志
梁满
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Original Assignee
BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD filed Critical BEIJING YOYO TIANYU SYSTEM TECHNOLOGY CO LTD
Priority to CN202410150164.6A priority Critical patent/CN117708262A/en
Publication of CN117708262A publication Critical patent/CN117708262A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method, a device and electronic equipment for carrying out data association on multidimensional and multi-source data. The method comprises the following steps: acquiring original data information; sorting the original data information; preprocessing the sequenced original data information to obtain a data information set to be matched; acquiring a standard data information set and an associated data information set; matching the data information to be matched in the data information set to be matched with the standard data information in the standard data information set and the associated data information in the associated data information set; when the data information to be matched is matched with a certain piece of standard data information, establishing an association relation between the data information to be matched and the standard data information; when the data information to be matched is matched with a certain piece of associated data information, determining standard data information associated with the piece of associated data information as target standard data information according to the association relation corresponding to the piece of associated data information, and establishing the association relation for the data information to be matched and the target standard data information.

Description

Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for performing data association on multidimensional and multi-source data, and an electronic device.
Background
The address is the basis of the geographic information system. The public security system has the advantages that the public security system is convenient to use, and the public security system is convenient to use. However, a large number of addresses used in daily life are not standardized, and there are often problems of short times of short and ambiguous descriptions, for example, the address form of the standard prescribed by the national post office is: XX province, XX city, XX area, XX street, XX number, XX chamber post code: XXXXXX, but in daily life we may write only XX province, XX city, XX street, XX number. Also, aliases and popular name applications are common. Therefore, it is very important to associate various address information appearing or used in daily life with standard addresses and to establish a standard address library. Referring to fig. 1, the inventor finds that the conventional data association method generally comprises the following steps: s101: acquiring original address information; s102: standard address information is acquired; s103: data cleaning is carried out on the original address information to be matched to obtain address information to be matched; s104: and matching the address information to be matched with the standard address information in a manual matching mode, if the matching is successful, correlating the address information to be matched with the standard address information, and manually inputting the correlation information into a standard address library. However, for large-scale address data, a great deal of time and manpower resources are consumed for manually matching the address information and inputting the associated information, and the efficiency is low; in addition, due to manual operation, the problems of fatigue, inattention and the like, and matching errors or associated information input errors are easy to occur, so that the data quality is reduced. Moreover, the manual matching of address information has the subjective problem that different operators can understand and explain the same address differently, so that the problem of objectivity and inaccuracy of the matching result in the prior art is caused. In summary, the existing data association method has the problems of time and labor consumption, low efficiency and low accuracy.
Disclosure of Invention
The embodiment of the invention provides a method, a device and electronic equipment for carrying out data association on multidimensional and multi-source data, which are used for solving the problems of time and labor consumption, low efficiency and low accuracy in the prior art.
The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for performing data association on multidimensional multi-source data, including the following steps:
respectively acquiring original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
sorting the original data information in the original data information set;
preprocessing the sequenced original data information to obtain a data information set to be matched;
acquiring a standard data information set and an associated data information set from a standard database, wherein the associated data information set is a set of data information matched with standard data information in the standard data information set;
for each piece of data information to be matched in the data information set to be matched, respectively matching the data information to be matched with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm;
When the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the step of sorting the original data information in the original data information set includes:
and sorting the original data information in the original data information set according to the character information contained in each piece of original data information in the original data information set.
Optionally, the step of matching each piece of to-be-matched data information in the to-be-matched data information set with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm includes:
for each piece of data information to be matched in the data information set to be matched, matching the data information to each piece of standard data information in the standard data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database;
when the data information to be matched is not matched with all standard data information in the standard data information set, respectively matching the data information to be matched with each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the method further comprises:
each time a round of matching is completed for the data information to be matched in the data information set to be matched;
judging whether a new association relation is established in the round of matching;
when a new association relation is established in the round of matching, the next round of matching is performed;
when no new association relation is established in the round of matching, the matching is ended.
Optionally, when a new association relationship is established in the present round of matching, the step of performing the next round of matching includes:
acquiring the data information set to be matched of the round, and taking the data information set to be matched which is not successfully matched as a new data information set to be matched;
acquiring data information to be matched successfully matched in the round as a new associated data information set;
for each piece of data information to be matched in the new data information set to be matched, matching the data information to be matched with each piece of associated data information in the new associated data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the new associated data information set, obtaining an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the method further comprises:
and dynamically adjusting parameters in the preset matching algorithm according to the data information to be matched successfully matched in the round and the newly established association relation when the round of matching is finished.
Optionally, the data information is address information;
the standard database is a standard address library.
In a second aspect, an embodiment of the present invention provides an apparatus for performing data association on multidimensional multi-source data, where the apparatus includes:
the first acquisition module is used for respectively acquiring the original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
the sorting module is used for sorting the original data information in the original data information set;
the preprocessing module is used for preprocessing the sequenced original data information to obtain a data information set to be matched;
the second acquisition module is used for acquiring a standard data information set and an associated data information set from a standard database, wherein the associated data information set is a set of data information matched with standard data information in the standard data information set;
The matching module is used for matching each piece of to-be-matched data information in the to-be-matched data information set with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by utilizing a preset matching algorithm;
the first association module is used for establishing an association relation between the data information to be matched and a piece of standard data information when the data information to be matched is matched with the standard data information in the standard data information set, and storing the newly established association relation and the data information to be matched into the association data information set in the standard database;
the second association module is used for acquiring an association relation corresponding to a certain piece of association data information when the data information to be matched is matched with the certain piece of association data information in the association data information set; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the sorting module is specifically configured to:
and sorting the original data information in the original data information set according to the character information contained in each piece of original data information in the original data information set.
Optionally, the matching module is specifically configured to:
for each piece of data information to be matched in the data information set to be matched, matching the data information to each piece of standard data information in the standard data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database;
when the data information to be matched is not matched with all standard data information in the standard data information set, respectively matching the data information to be matched with each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the apparatus further includes:
the judging module is used for judging whether a new association relation is established in the round of matching every time one round of matching is completed for the data information to be matched in the data information set to be matched;
when a new association relation is established in the round of matching, starting an iteration module;
when no new association relation is established in the round of matching, the matching is ended.
Optionally, the iteration module is specifically configured to:
acquiring the data information set to be matched of the round, and taking the data information set to be matched which is not successfully matched as a new data information set to be matched;
acquiring data information to be matched successfully matched in the round as a new associated data information set;
for each piece of data information to be matched in the new data information set to be matched, matching the data information to be matched with each piece of associated data information in the new associated data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the new associated data information set, obtaining an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
Optionally, the apparatus further includes:
and the parameter adjusting module is used for dynamically adjusting parameters in the preset matching algorithm according to the to-be-matched data information successfully matched in the current round and the newly established association relation when the matching of each round is finished.
Optionally, the data information is address information; the standard database is a standard address library.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the method steps of performing data association on the multidimensional multi-source data according to the first aspect when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the method steps of performing data association on the multidimensional multi-source data according to the first aspect.
The method for carrying out data association on the multidimensional and multi-source data provided by the embodiment of the invention can realize that the original data information is acquired from different storage addresses to form an original data information set; sorting the original data information in the original data information set; preprocessing the sequenced original data information to obtain a data information set to be matched; acquiring a standard data information set and an associated data information set from a standard database; for each piece of data information to be matched in the data information set to be matched, respectively matching the data information to be matched with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm; when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the association relation and the data information to be matched into an association data information set in the standard database; when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; and confirming standard data information associated with the piece of associated data information according to the association relation, establishing the association relation for the data information to be matched and the confirmed standard data information, and storing the association relation and the data information to be matched into an associated data information set in the standard database. The method provided by the embodiment of the invention can realize the disordered data information aiming at the multidimensional and multisource, can automatically correlate the data information with the standard data information, and can automatically save the association relation; rather than requiring manual confirmation of matching data information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of data association in the prior art;
FIG. 2 is a schematic flow chart of a method for performing data association on multi-dimensional and multi-source data according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another method for performing data association on multi-dimensional and multi-source data according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of the method provided by the embodiment of the invention applied to the field of address information matching;
FIG. 5 is a schematic structural diagram of a device for performing data association on multi-dimensional and multi-source data according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Fig. 2 is a flow chart of a method for performing data association on multi-dimensional and multi-source data according to an embodiment of the present invention, as shown in fig. 2, the method includes:
s201: respectively acquiring original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
the original data information may be stored in different databases such as SQL database, mysql database or Oracle database, etc., or may be stored in CSV or Excel file. Specifically, the original data information may be obtained from different storage addresses according to the storage address of the original data information.
In a particular embodiment, the data information may be address information. One piece of address information may include: the address information is multi-dimensional information because of the information of a plurality of dimensions such as country, province, city, district, street, district or company name, floor, zip code, etc.
In addition, the obtained original data information generally includes the generic names, the common names, the short names, the descriptions under different scenes, and the like of the data, that is, the original data information has many cases of describing the same data information from different dimensions. For example, there is often a multidimensional description of the address of the same company, such as XX company. Can be described as Beijing-Toku zone X, qingdao A base X, beijing XX company, guangdong 38A zone X, etc. In the method provided by the embodiment of the invention, a plurality of calculation strategies and matching strategies are connected in series, different weights are set for different strategies, and multi-strategy serial calculation is implemented on the data to be matched, so that the purpose of determining the content of the data information specifically corresponding to the multi-dimensional description by combining different time-space, semantics, contexts and the like can be realized. For example, the method provided by the invention can realize that the "Beijing Yizhu mansion Tokyo district X layer" is matched with the standard address "Beijing Qingzhu mansion Tokyo district Guangdong district X layer" by implementing multi-strategy serial calculation on the "Beijing Yizhu mansion Tokyo district X layer". After the matched standard address is determined, the method provided by the invention also establishes an association relation between the to-be-matched address 'Beijing-light building east zone X layer' and the standard address 'Beijing-city-Kogyan Guangdong zone 38 No. east zone X layer', and stores the association relation and 'Beijing-light building east zone X layer' into the association address in the standard address library.
The aim of storing the data to be matched into the standard data is to expand the knowledge base and accelerate the matching speed. For example, after the address to be matched "beijing-light building east zone X layer" is stored in the standard address library, if the new address to be matched is "a light building a base X layer", which is similar to "beijing-light building east zone X layer", the standard address matching the new address to be matched "a light building a base X layer" can be quickly determined to be "beijing-city korean sun zone wide canal way 38 st zone X layer". In order to further increase the matching speed, the method provided by the embodiment of the invention can also put newly acquired knowledge, namely the data to be matched newly stored in the standard database, in a cache which is read preferentially as an object for preferentially matching.
S202: sorting the original data information in the original data information set;
specifically, the original data information in the original data information set may be ordered according to character information included in each piece of original data information in the original data information set. For example, in one particular embodiment, the data information is address information. The character information in one address information may include: the country, province, city, district, street, district or company name, floor and postal code, the original data information can be ordered according to the country information in the character information, the address information of the same country is arranged together, then the original data information is ordered according to the province information in the character information, the address information of the same province is arranged together, and so on. The data information sorting method has the advantages that the sorted data information is easier to find out the similarity of the data information, and the similarity among the data information is beneficial to summarizing rules and calculation and is beneficial to subsequent matching work.
S203: preprocessing the sequenced original data information to obtain a data information set to be matched;
before the original data in the original data information set is matched, the original data information in the original data information set needs to be preprocessed, namely data cleaning, and specifically includes removing repeated data, unified naming convention, unified format and the like in the original data information set. The original data information is preprocessed, so that the problems of inconsistent data and repeated data existing in the original data information set are solved.
In a specific embodiment, different preprocessing modes can be adopted according to different data information, for example, if the data information is address information, a series of preprocessing modes such as address information analysis, bad address judgment, full-angle half-angle conversion, conversion of a house number segment, unified conversion of Chinese numbers and Arabic numbers can be adopted.
S204: acquiring a standard data information set and an associated data information set from a standard database, wherein the associated data information set is a set of data information matched with standard data information in the standard data information set;
if the data information is address information, the standard data information can be acquired from an address authority such as a national post office, and all acquired standard data information can be stored in a standard database.
S205: for each piece of data information to be matched in the data information set to be matched, respectively matching the data information to be matched with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by utilizing a preset matching algorithm;
the matching process is to calculate the matching degree, the reliability, the association degree and the probability between the data information to be matched and the standard data information by using a series of preset serial matching algorithms, then calculate the matching probability value corresponding to the data information to be matched according to the matching degree, the reliability, the association degree and the probability, and if the matching probability value is larger than a preset threshold value, the data information to be matched is indicated to be matched with the standard data information, otherwise, the data information to be matched is not matched.
The method provided by the embodiment of the invention can continuously expand the knowledge base according to the matching result of each iteration, and then judge and calculate the matching degree, the credibility, the association degree and the probability between the data information to be matched and the standard data information by utilizing new knowledge. And then calculating a matching probability value corresponding to the data information to be matched according to the matching degree, the reliability, the association degree and the probability, if the matching probability value is larger than a preset threshold value, indicating that the data information to be matched is matched with the standard data information, otherwise, not matching. The loop is then completed until no new knowledge is generated, i.e., no new matches are generated, and the matching process is completed.
Specifically, calculating the matching probability value of the data information to be matched in a single cycle may be calculated using the following formula: single matching probability value = matching degreeCredibility +.>Association degree->And (5) weighting.
The calculation of the total matching probability value of the data information to be matched in the whole loop iteration matching process can be calculated by using the following formulaAnd (3) calculating: total match probability value = Σ (i-th match probability value)i 2 )/>And (5) weighting.
The matching degree is the degree of matching between the data information to be matched and the standard data information. The higher the matching degree, the higher the probability of matching with the standard data information.
The confidence level represents the confidence level of the data information to be matched. The data information to be matched with high reliability is more likely to be matched with the standard database.
The association degree is used for measuring the association degree between the data information to be matched and the standard data information. If there is obvious correlation between the data information to be matched and the standard data information, the matching probability is higher.
The probability represents the probability of a match, a value between 0 and 1, representing the likelihood of a match being successful.
In a specific embodiment, the matching result can be manually verified, and the verification result is fed back, so that the preset matching algorithm can adjust own parameters according to the verification result fed back by manual verification, and the future matching efficiency and accuracy of the matching algorithm can be improved. For example, by manually checking that a large amount of data information to be matched with the standard data information with the matching probability value of 0.6 is matched with the standard data information, the method provided by the embodiment of the invention can receive feedback information of the user, and adjust parameters of a matching algorithm according to the feedback information. Thereby improving the accuracy of matching and accelerating the matching process.
S206: when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database;
s207: when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database.
In a specific embodiment, for each piece of data information to be matched in the data information set to be matched, the step of respectively matching the data information to each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm may include:
For each piece of data information to be matched in the data information set to be matched, matching the data information to each piece of standard data information in the standard data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database;
when the data information to be matched is not matched with all standard data information in the standard data information set, respectively matching the data information to be matched with each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database.
In another specific embodiment, the method provided by the embodiment of the invention may further include the following steps:
each time a round of matching is completed for the data information to be matched in the data information set to be matched;
judging whether a new association relation is established in the round of matching;
when a new association relation is established in the round of matching, the next round of matching is performed;
when no new association relation is established in the round of matching, the matching is ended.
Optionally, when a new association relationship is established in the present round of matching, the step of performing the next round of matching may include:
acquiring the data information set to be matched of the round, and taking the data information set to be matched which is not successfully matched as a new data information set to be matched;
acquiring data information to be matched successfully matched in the round as a new associated data information set;
for each piece of data information to be matched in the new data information set to be matched, matching the data information to be matched with each piece of associated data information in the new associated data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the new associated data information set, obtaining an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database.
That is to say, the method provided by the embodiment of the invention carries out loop iteration matching on the unmatched data in the data set to be matched and the new successfully matched data once and once until iteration is finished when no new match is generated.
In another specific embodiment, the method for associating multidimensional and multi-source data provided by the invention further includes:
and dynamically adjusting parameters in a preset matching algorithm according to the data information to be matched successfully matched in the round and the newly established association relation when the round of matching is finished.
In the embodiment, the parameters in the matching algorithm can be dynamically adjusted according to the matching result, so that the accuracy of future matching of the matching algorithm can be improved.
As can be seen from fig. 2, the method for performing data association on multi-dimensional multi-source data provided by the embodiment of the present invention can be implemented to obtain original data information from different storage addresses, so as to form an original data information set; sorting the original data information in the original data information set; preprocessing the sequenced original data information to obtain a data information set to be matched; acquiring a standard data information set and an associated data information set from a standard database; for each piece of data information to be matched in the data information set to be matched, respectively matching the data information to be matched with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by utilizing a preset matching algorithm; when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database; when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database. The method provided by the embodiment of the invention can realize the disordered data information aiming at the multidimensional and multisource, can automatically correlate the data information with the standard data information, and can automatically save the association relation; rather than requiring manual confirmation of matching data information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art.
The method provided by the embodiment of the invention is further described in more detail below by taking a practical example. In this embodiment, the data information is address information, and the standard database is a standard address database. The purpose of the embodiment is to associate the nonstandard address information such as nonstandard expression, short layer breaking times, ambiguous description, aliases, more contracted common names and the like used in daily life with the standard address obtained from the address authority of the national post office, and establish a standard address library, so that the standard address obtained from the address authority of the national post office and the nonstandard address associated with the standard address exist in the address library, and when a nonstandard address is received, the standard address associated with the nonstandard address can be quickly determined according to the established association relation. Referring to fig. 3 and 4, the specific steps of the method provided in this embodiment are as follows:
s301: respectively acquiring original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
Specifically, the raw address information may be available from various sources, such as databases, excel files, user inputs, external data suppliers or historians, and so forth.
S302: and sorting the original address information in the original address information set according to character information contained in each piece of original address information in the original address information set.
The character information in one address information may include: the country, province, city, district, street, district or company name, floor and postal code, the original address information can be ordered according to the country information in the character information, the address information of the same country is arranged together, then the original address information is ordered according to the province information in the character information, the address information of the same province is arranged together, and the order is from big to small. The ordering can be performed according to the mail codes, or can be performed according to the names of companies or cells, then according to streets, and then according to the order of the levels of district, city, province and country from small to large, and in summary, the user can set the ordering according to the needs of the user.
S303: preprocessing the ordered original address information to obtain an address information set to be matched;
In a specific embodiment, a series of preprocessing methods such as address information analysis, bad address determination, full-angle half-angle conversion, conversion of a house number segment, and unified conversion of Chinese numbers and Arabic numbers can be adopted. The address information analysis is used for analyzing the address information, carrying out regular and word segmentation processing and converting the address information into a structured address from a text, and through the step, an address entity, an associated information entity (certificate number, associated dispatcher and the like) and the like in the text are extracted. For example, the two address information are respectively: the address information analysis can give the identification that the two address information is matched with the stage of the area.
For address information with serious non-preset local address or address information missing, the bad address judging algorithm can judge the address information as a bad address, for example, the preset local address is a Chinese domestic address, and an address in certain address information is a foreign address, then the bad address judging algorithm can judge the address information as a bad address. No further steps will be performed for those address information that are determined to be bad addresses.
S304: acquiring a standard address information set and an associated address information set from a standard address library;
wherein the set of associated address information is a set of address information that matches the standard address information in the set of standard address information.
S305: selecting one piece of address information to be matched from the address information set to be matched, and matching the selected address information to be matched with each piece of standard address information in the standard address information set by using a preset matching algorithm; when the address information to be matched is matched with a piece of standard address information in the standard address information set, executing step S306; when the address information to be matched is not matched with all the standard address information in the standard address information set, executing step S307;
the selected address information to be matched is address information to be matched which is not matched in the round of matching.
In a specific embodiment, the preset matching algorithm may be the following algorithm: the method comprises the steps of carrying out matching on address information to be matched and standard address information or address information to be matched and associated address information in sequence to obtain a matching result, wherein the matching result comprises a high-reliability matching algorithm (regular expression class), a low-reliability matching algorithm, a graph matching algorithm, an information associated matching algorithm, a pattern recognition algorithm, a pattern application algorithm, a cell class matching algorithm, a full-text search matching algorithm and an algorithm result integration judging algorithm. In addition, the method provided by the embodiment of the invention sets the weight for each algorithm according to the performance and the characteristics of each algorithm, so that the matching result is more accurate. Wherein,
The high-reliability matching algorithm is to match the address information according to the text of the standard address information, and the reliability of the algorithm is higher.
The low-confidence matching algorithm is to combine the text of the address information with the alias replacement to carry out address matching on the address information, but the reliability of the algorithm is uncontrollable due to the quality problem of the alias, so the low-confidence matching algorithm is received as low-confidence and is used for matching with other algorithm results to be added as a main part.
The graph in the graph matching algorithm refers to not a graph database, but a memory graph. The memory map can be retrieved by word segmentation or bi-gram (a technique commonly used in text analysis, which can analyze two adjacent words in text as a group), and calculate the most reasonable nodes according to the path. The data in the memory map is formed by adding the program carefully selected address information data to the basic data in the address information.
The information association matching algorithm is used for acquiring associable information from various address information to be matched, such as: obtaining a certificate number, a bank card number or a telephone number, etc.; and providing matching information for other address matching with the associated information through the associated relation of the associable information and the address information. For example, the certificate number contained in the B address information is the same as the certificate number information contained in the a address information, and an association relationship exists between the a address information and the Y standard address information in the standard address library; then the information association matching algorithm can infer therefrom that there may also be an association between the B address information and the Y standard address information in the standard address library. The matching reliability of the algorithm is high.
The pattern recognition algorithm calculates information out of coverage areas of some matching algorithms according to the acquired associated information of the information associated matching algorithm, and summarizes matching patterns between the information out of the coverage areas and the corresponding matching results according to the information out of the coverage areas and the corresponding matching results, generates new knowledge and stores the generated new knowledge for the application algorithm of the subsequent patterns. Ordered addresses are particularly important for pattern recognition.
The pattern application algorithm is used for performing pattern matching on the address information, generating a converted address information text according to the pattern corresponding to the address information, and attempting to perform address information matching.
The cell matching algorithm is an application of a Word2Vec (Word vector for converting words in natural language into dense vectors which can be understood by a computer) address library modification algorithm.
The full text search matching algorithm is used for taking full text information as a main processing object and searching and matching according to the content of the address information.
The algorithm result integration judging algorithm is used for judging how to choose and integrate the results calculated by the previous matching algorithms, so as to obtain the final matching result.
S306: establishing an association relation for the address information to be matched and the standard address information, and storing the newly established association relation and the address information to be matched into an association address information set in a standard address library;
s307: respectively matching the address information to be matched with each piece of associated address information in the associated address information set by using a preset matching algorithm; when the address information to be matched matches with a certain piece of associated address information in the associated address information set, executing step S308;
when the address information to be matched is not matched with all the associated address information in the associated address information set, the failure of matching is indicated, and the association relation with the standard address information cannot be established.
S308: acquiring an association relation corresponding to the association address information matched with the address information to be matched; according to the association relation, determining the standard address information associated with the piece of association address information as target standard address information, establishing an association relation for the address information to be matched and the target standard address information, and storing the newly established association relation and the address information to be matched into an association address information set in a standard address library;
S309: judging whether the primary matching is completed for each piece of data information to be matched in the data information set to be matched in the primary matching; if yes, go to step S310; if not, continuing to execute step S305;
s310: judging whether a new association relation is established in the round of matching; if yes, go to step S311; if not, the matching is ended;
s311: dynamically adjusting parameters in a preset matching algorithm according to address information to be matched successfully matched in the round and the newly established association relation;
in other embodiments, after the whole matching process is finished, parameters in a preset matching algorithm may be adjusted according to address information to be matched successfully in the whole matching process and the newly established association relationship.
S312: acquiring address information to be matched of the round, which is not successfully matched, as a new address information set to be matched;
s313: obtaining address information to be matched, which is successfully matched in the round, and taking the address information as a new associated address information set;
s314: selecting one piece of address information to be matched from the new address information set to be matched, and matching the selected address information to be matched with each piece of associated address information in the new associated address information set by utilizing a preset matching algorithm;
The selected address information to be matched is address information to be matched which is not matched in the round of matching.
S315: when the address information to be matched is matched with a certain piece of associated address information in the new associated address information set, obtaining an associated relation corresponding to the piece of associated address information; according to the association relation, determining the standard address information associated with the piece of association address information as target standard address information, establishing an association relation for the address information to be matched and the target standard address information, and storing the newly established association relation and the address information to be matched into an association address information set in a standard address library;
when the address information to be matched is not matched with all the associated address information in the new associated address information set, the failure of matching is indicated, and the association relation with the standard address information cannot be established.
S316: judging whether one-time matching is completed for each piece of address information to be matched in the new address information set to be matched in the round of matching; if yes, go to step S310; if not, step S314 is continued.
Referring to fig. 4, in a specific embodiment, in addition to storing the association relationship between the address information to be matched and the standard address information in the matching process, a text relationship and an entity relationship between the address information can be stored, which is beneficial to improving the matching efficiency and accuracy of the algorithm. The text relationship refers to an association relationship between texts of address information, for example: the two texts of "Yabao lu 2 in the Chaoyang district of Beijing city" and "the child hospital attached to the pediatric study of the capital" express the same address. Entity relationships are relationships between entities, such as: if the company addresses of the first person and the second person are the same, it can be further deduced that the first person and the second person are the colleague, and conversely, if the first person and the second person are known to be the colleague, it can be deduced that the company addresses are the same. In addition, the established association relationship can be manually verified, and the verification result is fed back, so that the preset matching algorithm can adjust own parameters according to the verification result fed back by manual verification, and the future matching efficiency and accuracy of the matching algorithm can be improved.
Therefore, by implementing the method provided by the embodiment of the invention, the disordered address information aiming at the multidimensional and multisource can be realized, the address information can be automatically associated with the standard address information, and the association relation can be automatically saved; instead of requiring manual confirmation of matching address information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art. In addition, the invention can dynamically adjust parameters in the matching algorithm according to the matching result, thereby further improving the efficiency and accuracy of future matching of the matching algorithm, and further improving the efficiency and accuracy of data association of multi-dimensional multi-source data.
Corresponding to the embodiment shown in fig. 2, the embodiment of the invention further provides a device for carrying out data association on the multidimensional and multi-source data. As shown in fig. 5, the apparatus includes: a first acquisition module 501, a ranking module 502, a preprocessing module 503, a second acquisition module 504, a matching module 505, a first correlation module 506 and a second correlation module 507, wherein,
A first obtaining module 501, configured to obtain original data information from different storage addresses respectively; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
the sorting module 502 is configured to sort the original data information in the original data information set;
a preprocessing module 503, configured to preprocess the ordered original data information to obtain a data information set to be matched;
a second obtaining module 504, configured to obtain a standard data information set and an associated data information set from a standard database, where the associated data information set is a set of data information matched with standard data information in the standard data information set;
the matching module 505 is configured to match each piece of to-be-matched data information in the to-be-matched data information set with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm;
the first association module 506 is configured to establish an association relationship between the data information to be matched and a piece of standard data information in the standard data information set when the data information to be matched is matched with the piece of standard data information, and store the newly established association relationship and the data information to be matched in an association data information set in the standard database;
The second association module 507 is configured to obtain an association relationship corresponding to a piece of associated data information when the data information to be matched matches the piece of associated data information in the associated data information set; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database.
Optionally, the sorting module 502 is specifically configured to:
and sorting the original data information in the original data information set according to the character information contained in each piece of original data information in the original data information set.
Optionally, the matching module 505 is specifically configured to:
for each piece of data information to be matched in the data information set to be matched, matching the data information to each piece of standard data information in the standard data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a piece of standard data information in the standard data information set, starting a first association module 506;
When the data information to be matched is not matched with all standard data information in the standard data information set, respectively matching the data information to be matched with each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched matches with a certain piece of associated data information in the associated data information set, the second association module 507 is started.
Optionally, the device further includes:
the judging module is used for judging whether a new association relation is established in the round of matching every time one round of matching is completed for the data information to be matched in the data information set to be matched;
when a new association relation is established in the round of matching, starting an iteration module;
when no new association relation is established in the round of matching, the matching is ended.
Optionally, the iteration module is specifically configured to:
acquiring the data information set to be matched of the round, and taking the data information set to be matched which is not successfully matched as a new data information set to be matched;
acquiring data information to be matched successfully matched in the round as a new associated data information set; for each piece of data information to be matched in the new data information set to be matched, matching the data information to be matched with each piece of associated data information in the new associated data information set by utilizing a preset matching algorithm;
When the data information to be matched is matched with a certain piece of associated data information in the new associated data information set, obtaining an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in a standard database.
Optionally, the device further includes:
and the parameter adjusting module is used for dynamically adjusting parameters in a preset matching algorithm according to the to-be-matched data information successfully matched in the current round and the newly established association relation when the matching of each round is finished.
Optionally, the data information is address information; the standard database is a standard address library.
Therefore, the device provided by the embodiment of the invention can realize the disordered address information aiming at the multidimensional and multisource, can automatically correlate the address information with the standard address information, and can automatically save the association relation; instead of requiring manual confirmation of matching address information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art. In addition, the invention can dynamically adjust parameters in the matching algorithm according to the matching result, thereby further improving the efficiency and accuracy of future matching of the matching algorithm, and further improving the efficiency and accuracy of data association of multi-dimensional multi-source data.
Corresponding to the embodiment shown in fig. 2, the embodiment of the present invention further provides an electronic device, see fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 perform communication with each other through the communication bus 604;
a memory 603 for storing a computer program;
the processor 601 is configured to implement the method steps of performing data association on any of the multidimensional multi-source data in the above embodiments when executing the program stored in the memory.
The electronic equipment provided by the embodiment of the invention can realize the disordered address information aiming at the multidimensional and multisource, can automatically correlate the address information with the standard address information, and can automatically save the association relation; instead of requiring manual confirmation of matching address information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art. In addition, the invention can dynamically adjust parameters in the matching algorithm according to the matching result, thereby further improving the efficiency and accuracy of future matching of the matching algorithm, and further improving the efficiency and accuracy of data association of multi-dimensional multi-source data.
Corresponding to the embodiment shown in fig. 2, the embodiment of the present invention further provides a computer readable storage medium, in which a computer program is stored, which when executed by a processor, implements the method steps of performing data association on any of the multi-dimensional multi-source data in the above embodiment.
The storage medium provided by the embodiment of the invention can realize the disordered address information aiming at multidimensional and multisource, can automatically correlate the address information with standard address information, and can automatically save the association relation; instead of requiring manual confirmation of matching address information as in the prior art and requiring manual entry of associated information. Therefore, the method provided by the embodiment of the invention solves the problems of time and labor consumption, low efficiency and low accuracy in the prior art. In addition, the invention can dynamically adjust parameters in the matching algorithm according to the matching result, thereby further improving the efficiency and accuracy of future matching of the matching algorithm, and further improving the efficiency and accuracy of data association of multi-dimensional multi-source data.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A method for data correlation of multi-dimensional multi-source data, comprising the steps of:
respectively acquiring original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
sorting the original data information in the original data information set;
Preprocessing the sequenced original data information to obtain a data information set to be matched;
acquiring a standard data information set and an associated data information set from a standard database, wherein the associated data information set is a set of data information matched with standard data information in the standard data information set;
for each piece of data information to be matched in the data information set to be matched, respectively matching the data information to be matched with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
2. The method of claim 1, wherein the step of ordering the original data information in the set of original data information comprises:
and sorting the original data information in the original data information set according to the character information contained in each piece of original data information in the original data information set.
3. The method according to claim 1, wherein the step of matching each piece of the data information to be matched in the set of data information with each piece of the standard data information in the set of standard data information and each piece of the associated data information in the set of associated data information by using a preset matching algorithm, respectively, includes:
for each piece of data information to be matched in the data information set to be matched, matching the data information to each piece of standard data information in the standard data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of standard data information in the standard data information set, establishing an association relation between the data information to be matched and the standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database;
When the data information to be matched is not matched with all standard data information in the standard data information set, respectively matching the data information to be matched with each piece of associated data information in the associated data information set by using a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the associated data information set, acquiring an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
4. The method according to claim 1, characterized in that the method further comprises:
each time a round of matching is completed for the data information to be matched in the data information set to be matched;
judging whether a new association relation is established in the round of matching;
when a new association relation is established in the round of matching, the next round of matching is performed;
When no new association relation is established in the round of matching, the matching is ended.
5. The method of claim 4, wherein when a new association is established in the present round of matching, the step of performing the next round of matching includes:
acquiring the data information set to be matched of the round, and taking the data information set to be matched which is not successfully matched as a new data information set to be matched;
acquiring data information to be matched successfully matched in the round as a new associated data information set;
for each piece of data information to be matched in the new data information set to be matched, matching the data information to be matched with each piece of associated data information in the new associated data information set by utilizing a preset matching algorithm;
when the data information to be matched is matched with a certain piece of associated data information in the new associated data information set, obtaining an associated relation corresponding to the piece of associated data information; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
6. The method according to claim 4, further comprising:
and dynamically adjusting parameters in the preset matching algorithm according to the data information to be matched successfully matched in the round and the newly established association relation when the round of matching is finished.
7. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the data information is address information;
the standard database is a standard address library.
8. An apparatus for data correlation of multi-dimensional multi-source data, the apparatus comprising:
the first acquisition module is used for respectively acquiring the original data information from different storage addresses; wherein the original data information is multidimensional information; the collection of the original data information forms an original data information set;
the sorting module is used for sorting the original data information in the original data information set;
the preprocessing module is used for preprocessing the sequenced original data information to obtain a data information set to be matched;
the second acquisition module is used for acquiring a standard data information set and an associated data information set from a standard database, wherein the associated data information set is a set of data information matched with standard data information in the standard data information set;
The matching module is used for matching each piece of to-be-matched data information in the to-be-matched data information set with each piece of standard data information in the standard data information set and each piece of associated data information in the associated data information set by utilizing a preset matching algorithm;
the first association module is used for establishing an association relation between the data information to be matched and a piece of standard data information when the data information to be matched is matched with the standard data information in the standard data information set, and storing the newly established association relation and the data information to be matched into the association data information set in the standard database;
the second association module is used for acquiring an association relation corresponding to a certain piece of association data information when the data information to be matched is matched with the certain piece of association data information in the association data information set; according to the association relation corresponding to the piece of association data information, determining standard data information associated with the piece of association data information as target standard data information, establishing association relation for the data information to be matched and the target standard data information, and storing the newly established association relation and the data information to be matched into an association data information set in the standard database.
9. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, the memory complete communication with each other through the communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the method steps of any of claims 1-7 when executing a program stored on the memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-7.
CN202410150164.6A 2024-02-02 2024-02-02 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment Pending CN117708262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410150164.6A CN117708262A (en) 2024-02-02 2024-02-02 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410150164.6A CN117708262A (en) 2024-02-02 2024-02-02 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment

Publications (1)

Publication Number Publication Date
CN117708262A true CN117708262A (en) 2024-03-15

Family

ID=90162739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410150164.6A Pending CN117708262A (en) 2024-02-02 2024-02-02 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment

Country Status (1)

Country Link
CN (1) CN117708262A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359200A (en) * 2018-10-11 2019-02-19 北京国信达数据技术有限公司 Place name address date intelligently parsing system
US10607179B1 (en) * 2019-07-15 2020-03-31 Coupang Corp. Computerized systems and methods for address correction
CN112347222A (en) * 2020-10-22 2021-02-09 中科曙光南京研究院有限公司 Method and system for converting non-standard address into standard address based on knowledge base reasoning
US11409660B1 (en) * 2021-11-19 2022-08-09 SafeGraph, Inc. Systems and methods for translating address strings to standardized addresses
CN117112707A (en) * 2023-07-31 2023-11-24 深圳市晓象科技有限公司 Standard address library collection and construction method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359200A (en) * 2018-10-11 2019-02-19 北京国信达数据技术有限公司 Place name address date intelligently parsing system
US10607179B1 (en) * 2019-07-15 2020-03-31 Coupang Corp. Computerized systems and methods for address correction
CN112347222A (en) * 2020-10-22 2021-02-09 中科曙光南京研究院有限公司 Method and system for converting non-standard address into standard address based on knowledge base reasoning
US11409660B1 (en) * 2021-11-19 2022-08-09 SafeGraph, Inc. Systems and methods for translating address strings to standardized addresses
CN117112707A (en) * 2023-07-31 2023-11-24 深圳市晓象科技有限公司 Standard address library collection and construction method and system

Similar Documents

Publication Publication Date Title
CN109325040B (en) FAQ question-answer library generalization method, device and equipment
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN110147421B (en) Target entity linking method, device, equipment and storage medium
CN112035599B (en) Query method and device based on vertical search, computer equipment and storage medium
CN102955833A (en) Correspondence address identifying and standardizing method
CN113592037B (en) Address matching method based on natural language inference
CN112417381B (en) Method and device for rapidly positioning infringement image applied to image copyright protection
CN109710792A (en) A kind of fast face searching system application based on index
KR20220134695A (en) System for author identification using artificial intelligence learning model and a method thereof
CN116414823A (en) Address positioning method and device based on word segmentation model
CN111984673B (en) Fuzzy retrieval method and device for tree structure of power grid electric energy metering system
CN112084293B (en) Data authentication system and data authentication method for public security field
CN113742292A (en) Multi-thread data retrieval and retrieved data access method based on AI technology
CN102915311B (en) Searching method and system
CN117076590A (en) Address standardization method, address standardization device, computer equipment and readable storage medium
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
CN117708262A (en) Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment
CN103678513A (en) Interactive search generation method and system
CN115455315A (en) Address matching model training method based on comparison learning
CN115495545A (en) Power grid operation panoramic model retrieval method, electronic device and storage medium
CN111177585A (en) Map POI feedback method and device
CN115408555A (en) Voiceprint retrieval method, system, storage medium and electronic equipment
CN114707174A (en) Data processing method and device, electronic equipment and storage medium
CN111383032B (en) Method and device for detecting authenticity of house source information
CN114003812A (en) Address matching method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination