CN111125289A - Store data cleaning and matching method, device, equipment and storage medium - Google Patents

Store data cleaning and matching method, device, equipment and storage medium Download PDF

Info

Publication number
CN111125289A
CN111125289A CN201911361361.8A CN201911361361A CN111125289A CN 111125289 A CN111125289 A CN 111125289A CN 201911361361 A CN201911361361 A CN 201911361361A CN 111125289 A CN111125289 A CN 111125289A
Authority
CN
China
Prior art keywords
store
store data
matching
data
longitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911361361.8A
Other languages
Chinese (zh)
Other versions
CN111125289B (en
Inventor
吴建胜
张丽丽
黄波
黄耀鸿
郭怡适
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imagedt Co ltd
Original Assignee
Imagedt Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imagedt Co ltd filed Critical Imagedt Co ltd
Priority to CN201911361361.8A priority Critical patent/CN111125289B/en
Publication of CN111125289A publication Critical patent/CN111125289A/en
Application granted granted Critical
Publication of CN111125289B publication Critical patent/CN111125289B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Remote Sensing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a store data cleaning and matching method, which comprises the following steps: washing store data, wherein the washing process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address; matching the washed store data through a preset store matching algorithm to obtain a matching result; and verifying the matching result through a preset store verification algorithm, wherein the preset store verification algorithm comprises a DBSCAN clustering algorithm. The embodiment of the invention also provides a shop data cleaning and matching device. By adopting the method and the device, the accuracy of the store data can be improved, and the false deletion rate of the store data during cleaning can be reduced, so that the matching accuracy of stores is improved, and stores which are close in distance but not adjacent can be accurately identified.

Description

Store data cleaning and matching method, device, equipment and storage medium
Technical Field
The invention relates to the field of store data matching, in particular to a store data cleaning and matching method, device, equipment and storage medium.
Background
In the cleaning and matching of industrial store data, different degrees of deviation of data such as store names, store addresses and store longitude and latitude occur in the same store due to different data acquisition parties, in the searching of the same store, the finding of similar stores and the analysis of similar stores, the original data cleaning and data matching model algorithm is difficult to realize, and many solutions are derived for the problem.
Disclosure of Invention
In order to solve the above problems, it is an object of the present invention to provide a store data cleaning and matching method, apparatus, device and storage medium, which can accurately perform store matching.
Based on the above, the invention provides a store data cleaning and matching method, which comprises the following steps:
washing store data, wherein the washing process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
matching the washed store data through a preset store matching algorithm to obtain a matching result;
and verifying the matching result through a preset store verification algorithm, wherein the preset store verification algorithm comprises a DBSCAN clustering algorithm.
Wherein the processing of the repeated store data comprises:
selecting first information and second information in the store data as screening conditions;
matching first information and second information in the store data;
and if the matching is consistent, one of the store data is reserved, and the rest store data of which the first information is matched with the second information is deleted.
Wherein the verifying the accuracy of the store data comprises:
inquiring the store address to obtain first longitude and latitude data;
taking the longitude and latitude data in the store data as second longitude and latitude data;
acquiring a longitude and latitude distance between the first longitude and latitude data and the second longitude and latitude data, wherein a calculation formula of the longitude and latitude distance is as follows:
c=sin(latA*pi/180)*sin(latB*pi/180)+cos(latA*pi/180)*cos(latB*pi/180)*cos((mlonA-mlonB)*pi/180)
dis=r*arccos(c)*pi/180
and judging whether the longitude and latitude distance is greater than a preset longitude and latitude distance threshold, if so, determining the second longitude and latitude data as inaccurate data, and covering the second longitude and latitude data with the first longitude and latitude data.
Wherein the complementing the missing store data comprises:
if the store address does not exist in the store data, acquiring the store address according to the store longitude and latitude;
if the store longitude and latitude do not exist in the store data, acquiring the store longitude and latitude according to the store address;
and if the store address and the store longitude and latitude do not exist in the store data, acquiring the store address and the store longitude and latitude according to the store name.
Wherein the eliminating of invalid store data comprises:
deleting the store data in which the store name, the store address and the store longitude and latitude do not exist in the store data;
and deleting the store data which do not have the store address and the store longitude and latitude and cannot be inquired in the preset store database.
Matching the washed store data through a preset store matching algorithm, wherein the step of obtaining a matching result comprises the following steps:
acquiring the data quantity of reference group stores as n, wherein the character lengths of the address characters of the reference group stores are respectively (P)11,P12,…,p1n);
Obtaining the data volume of a comparison group store as m, and the character lengths of the address of the reference group store are respectively (P)21,P22,…,p2m);
Judging whether the store names in the reference group of store data are consistent with the store names in the comparison group of store data or not;
if the reference group store data is consistent with the comparison group store data, matching store addresses in the reference group store data and store addresses in the comparison group store data according to a preset matching rule, wherein the preset matching rule comprises: the reference group store address field is s, the comparison group store address field is t, characters in the traversal t are compared with one of the characters in the s, if the characters in the t are consistent with the characters in the s, the characters in the s are modified into 1, the characters in the s are matched, a matching result about the s, namely a matching character string is obtained, and the length of the matching character string is consistent with the length of the address character of the s;
adding the character numerical values in the matched character string to obtain a successful matching numerical value and an address matching rate, wherein the address matching rate is the successful matching numerical value divided by p1n
Wherein the method further comprises:
if the store names in the two store data are consistent with the store addresses, the two stores are consistent stores;
if the store names and the store addresses in the two store data are consistent and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are the stores with inconsistent main roads;
if the store names, the client names and the store addresses in the two store data are consistent, and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are consistent main road stores;
wherein the first condition is:
Figure BDA0002334546030000041
and ρ > thres1
The second condition is as follows:
Figure BDA0002334546030000042
and ρ > thres2, the thres1 and thres2 being a first address match rate threshold and a second address match rate threshold, respectively.
The embodiment of the invention also provides a shop data cleaning and matching device, which comprises:
a cleaning module for cleaning store data, the cleaning process comprising: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
the matching module is used for matching the washed store data through a preset store matching algorithm to obtain a matching result;
and the verification module is used for verifying the matching result through a preset store verification algorithm, and the preset store verification algorithm comprises a DBSCAN clustering algorithm.
The embodiment of the invention also provides a store data cleaning and matching device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the steps of the method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above method.
By adopting the invention, firstly, the store data is cleaned, and the cleaning process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address; by cleaning the store data, the accuracy of the store data can be improved, and the false deletion rate of the store data during cleaning can be reduced. Matching the washed store data through a preset store matching algorithm to obtain a matching result; the matching process improves the matching accuracy of stores, and stores which are close in distance but not adjacent are accurately identified, for example, the stores are close in longitude and latitude distance but located in different main roads, so that the straight line distance is close and the actual path distance is longer. And verifying the matching result through a preset store verification algorithm. The matching result is further verified, and the reliability of the matching result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a store data cleaning and matching method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a store data cleaning and matching device provided by an embodiment of the invention;
fig. 3 is a schematic diagram of a process of matching character strings according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic diagram of a store data cleaning and matching method provided by an embodiment of the invention;
s101, cleaning store data, wherein the cleaning process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
wherein the processing of the repeated store data comprises:
selecting first information and second information in the store data as screening conditions;
matching first information and second information in the store data;
and if the matching is consistent, one of the store data is reserved, and the rest store data of which the first information is matched with the second information is deleted.
Wherein the verifying the accuracy of the store data comprises:
inquiring the store address to obtain first longitude and latitude data;
taking the longitude and latitude data in the store data as second longitude and latitude data;
acquiring a longitude and latitude distance between the first longitude and latitude data and the second longitude and latitude data, wherein a calculation formula of the longitude and latitude distance is as follows:
c=sin(latA*pi/180)*sin(latB*pi/180)+cos(latA*pi/180)*cos(latB*pi/180)*cos((mlonA-mlonB)*pi/180)
dis=r*arccos(c)*pi/180
and judging whether the longitude and latitude distance is greater than a preset longitude and latitude distance threshold, if so, determining the second longitude and latitude data as inaccurate data, and covering the second longitude and latitude data with the first longitude and latitude data.
Wherein the complementing the missing store data comprises:
if the store address does not exist in the store data, acquiring the store address according to the store longitude and latitude;
if the store longitude and latitude do not exist in the store data, acquiring the store longitude and latitude according to the store address;
and if the store address and the store longitude and latitude do not exist in the store data, acquiring the store address and the store longitude and latitude according to the store name.
Wherein the eliminating of invalid store data comprises:
deleting the store data in which the store name, the store address and the store longitude and latitude do not exist in the store data;
and deleting the store data which do not have the store address and the store longitude and latitude and cannot be inquired in the preset store database.
S102, matching the washed store data through a preset store matching algorithm to obtain a matching result;
matching the washed store data through a preset store matching algorithm, wherein the step of obtaining a matching result comprises the following steps:
acquiring the data quantity of reference group stores as n, wherein the character lengths of the address characters of the reference group stores are respectively (P)11,P12,…,p1n);
Obtaining the data volume of a comparison group store as m, and the character lengths of the address of the reference group store are respectively (P)21,P22,…,p2m);
Judging whether the store names in the reference group of store data are consistent with the store names in the comparison group of store data or not;
and if the two groups of store addresses are consistent, matching the store addresses in the reference group of store data with the store addresses in the comparison group of store data according to a preset matching rule. The preset matching rule comprises the following steps: the reference group store address field is s, the comparison group store address field is t, characters in the traversal t are compared with one of the characters in the s, if the characters in the t are consistent with the characters in the s, the characters in the s are modified into 1, the characters in the s are matched, a matching result about the s, namely a matching character string is obtained, and the length of the matching character string is consistent with the length of the address character of the s;
adding the character numerical values in the matched character string to obtain a successful matching numerical value and an address matching rate, wherein the address matching rate is the successful matching numerical value divided by p1n
For example, let the detailed address field of the reference store group be s, and the detailed address field of the comparison store group be t, let s be "XX number for major stone town saggy road in Guangzhou city zone of the wine, and let t be" X number for major stone district of Guangzhou city wine ".
Firstly, as shown in fig. 3, taking the first character of s as a reference, traversing all characters in t to compare with the first character of s, if t has characters consistent with the first character of s, modifying the first character of s to be 1, if t has characters inconsistent with the first character of s, modifying the first character of s to be 0, traversing s according to the characters, and obtaining a matching result s _ new of s to be 111110110111001 until all characters of s are traversed, wherein the length of the s _ new character string is consistent with the length of the s field;
and secondly, adding all character numerical values in the s _ new to obtain a successful matching numerical value g which is 11.
Wherein the method further comprises:
if the store names in the two store data are consistent with the store addresses, the two stores are consistent stores;
if the store names and the store addresses in the two store data are consistent and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are inconsistent main roads, and the preset value can be 1 km;
if the store names, the client names and the store addresses in the two store data are consistent, and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are consistent main road stores;
wherein the first condition is:
Figure BDA0002334546030000081
and ρ > thres1
The second condition is as follows:
Figure BDA0002334546030000082
and ρ > thres2, the thres1 and thres2 being a first address match rate threshold and a second address match rate threshold, respectively.
S103, verifying the matching result through a preset store verification algorithm, wherein the preset store verification algorithm comprises a DBSCAN clustering algorithm.
Step one, the longitude and latitude data of the reference group store data subjected to data cleaning are set as a sample set D ((lat)1,lon1),(lat2,lon2),……,(latn,lonn) Input neighborhood parameters (e, MinPts) and sample distance measurement mode data;
step two, initializing a core object set
Figure BDA0002334546030000083
Initializing cluster number k equal to 0, initializing sample set Γ equal to D, and cluster partitioning
Figure BDA0002334546030000084
For j ═ 1, 2, …, m, all core objects were found as follows:
a. by means of distance measurement, find sample xjE-neighborhood subsample set N e (xj);
b. if the number of the sub-sample set samples meets the condition that | N belongs to (xj) | is larger than or equal to MinPts, the sample x is usedjAdding a core object sample set:
Ω=Ω∪{xj}。
step four, if the core object set
Figure BDA0002334546030000091
The algorithm is finished, otherwise, the step five is carried out.
Step five, randomly selecting a core object from the core object set omega
Figure BDA0002334546030000092
Initializing current cluster core object queues
Figure BDA0002334546030000093
Initializing class sequence number k as k +1, and initializing current cluster sample set
Figure BDA0002334546030000094
Updating a set of unaccessed samples
Figure BDA0002334546030000095
Step six, if the current cluster core object queue
Figure BDA0002334546030000096
Then the current cluster C is clusteredkAfter generation, the cluster partition C is updated to { C ═ C1,C2,…,CkAnd updating a core object set omega-CkAnd step four is carried out, otherwise, the core object set omega-C is updatedk
Step seven, in the current cluster core object queue omegacurFetching a core object
Figure BDA0002334546030000097
Finding out all the belonged-neighborhood subsample sets through the neighborhood distance threshold belonged to
Figure BDA0002334546030000098
Order to
Figure BDA0002334546030000099
Figure BDA00023345460300000910
Updating the current cluster sample set Ck=Ck∪ delta, update the set of unaccessed samples Γ ═ Γ -delta, update
Figure BDA00023345460300000911
And C, turning to the step six.
By the result of the output: cluster division C ═ { C1,C2,…,CkChecking the matching result. Wherein e-neighborhood is for xjE D, the e-neighborhood of which contains the sum x in the sample set DjA set of subsamples with a distance of not more than ∈, i.e. N ∈ (x)j)={xj∈D|distance(xi,xj) ≦ e }, and the number of this subsample set is denoted as | N e (x)j)|。
By adopting the invention, firstly, the store data is cleaned, and the cleaning process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address; by cleaning the store data, the accuracy of the store data can be improved, and the false deletion rate of the store data during cleaning can be reduced. Matching the washed store data through a preset store matching algorithm to obtain a matching result; the matching process improves the matching accuracy of stores, and stores which are close in distance but not adjacent are accurately identified, for example, the stores are close in longitude and latitude distance but located in different main roads, so that the straight line distance is close and the actual path distance is longer. And verifying the matching result through a preset store verification algorithm. The matching result is further verified, and the reliability of the matching result is improved.
Fig. 2 is a schematic diagram of a store data cleaning and matching device provided by an embodiment of the present invention, where the device includes:
a cleaning module 201, configured to clean store data, where the cleaning process includes: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
the matching module 202 is configured to match the washed store data through a preset store matching algorithm to obtain a matching result;
and the verification module 203 is configured to verify the matching result through a preset store verification algorithm, where the preset store verification algorithm includes a DBSCAN clustering algorithm.
Technical features and technical effects of the shop data based cleaning and matching device provided by the embodiment of the invention are the same as those of the method provided by the embodiment of the invention, and are not repeated herein.
Furthermore, an embodiment of the present invention also proposes a storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method.
Furthermore, an embodiment of the present invention further provides an apparatus for store data cleaning and matching, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above method when executing the program.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims (10)

1. A store data cleaning and matching method is characterized by comprising the following steps:
washing store data, wherein the washing process comprises the following steps: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
matching the washed store data through a preset store matching algorithm to obtain a matching result;
and verifying the matching result through a preset store verification algorithm, wherein the preset store verification algorithm comprises a DBSCAN clustering algorithm.
2. The store data cleansing matching method according to claim 1, wherein the processing of the repeated store data comprises:
selecting first information and second information in the store data as screening conditions;
matching first information and second information in the store data;
and if the matching is consistent, one of the store data is reserved, and the rest store data of which the first information is matched with the second information is deleted.
3. The store data washout matching method of claim 1, wherein the verifying the accuracy of the store data comprises:
inquiring the store address to obtain first longitude and latitude data;
taking the longitude and latitude data in the store data as second longitude and latitude data;
acquiring a longitude and latitude distance between the first longitude and latitude data and the second longitude and latitude data, wherein a calculation formula of the longitude and latitude distance is as follows:
c=sin(latA*pi/180)*sin(latB*pi/180)+cos(latA*pi/180)*cos(latB*pi/180)*cos((mlonA-mlonB)*pi/180)
dis=r*arccos(c)*pi/180
and judging whether the longitude and latitude distance is greater than a preset longitude and latitude distance threshold, if so, determining the second longitude and latitude data as inaccurate data, and covering the second longitude and latitude data with the first longitude and latitude data.
4. The store data cleansing matching method according to claim 1, wherein the complementing missing store data comprises:
if the store address does not exist in the store data, acquiring the store address according to the store longitude and latitude;
if the store longitude and latitude do not exist in the store data, acquiring the store longitude and latitude according to the store address;
and if the store address and the store longitude and latitude do not exist in the store data, acquiring the store address and the store longitude and latitude according to the store name.
5. The store data cleansing matching method according to claim 1, wherein said culling invalid store data comprises:
deleting the store data in which the store name, the store address and the store longitude and latitude do not exist in the store data;
and deleting the store data which do not have the store address and the store longitude and latitude and cannot be inquired in the preset store database.
6. The store data cleaning and matching method according to claim 1, wherein the matching of the cleaned store data through a preset store matching algorithm to obtain a matching result comprises:
acquiring the data quantity of reference group stores as n, wherein the character lengths of the address characters of the reference group stores are respectively (P)11,P12,…,p1n);
Obtaining the data volume of a comparison group store as m, and the character lengths of the address of the reference group store are respectively (P)21,P22,…,p2m);
Judging whether the store names in the reference group of store data are consistent with the store names in the comparison group of store data or not;
if the reference group store data is consistent with the comparison group store data, matching store addresses in the reference group store data and store addresses in the comparison group store data according to a preset matching rule, wherein the preset matching rule comprises: the reference group store address field is s, the comparison group store address field is t, characters in the traversal t are compared with one of the characters in the s, if the characters in the t are consistent with the characters in the s, the characters in the s are modified into 1, the characters in the s are matched, a matching result about the s, namely a matching character string is obtained, and the length of the matching character string is consistent with the length of the address character of the s;
adding the character numerical values in the matched character string to obtain a successful matching numerical value and an address matching rate, wherein the address matching rate is the successful matching numerical value divided by p1n
7. The store data cleansing matching method of claim 6, wherein the method further comprises:
if the store names in the two store data are consistent with the store addresses, the two stores are consistent stores;
if the store names and the store addresses in the two store data are consistent and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are the stores with inconsistent main roads;
if the store names, the client names and the store addresses in the two store data are consistent, and the longitude and latitude distances between the stores are smaller than a preset value, if the conditions I and II are not met, the two stores are consistent main road stores;
wherein the first condition is:
Figure FDA0002334546020000031
and ρ > thres1
The second condition is as follows:
and ρ > thres2, the thres1 and thres2 being a first address match rate threshold and a second address match rate threshold, respectively.
8. An store data cleaning and matching device, comprising:
a cleaning module for cleaning store data, the cleaning process comprising: processing repeated store data, eliminating invalid store data, verifying the accuracy of the store data and completing missing store data, wherein the store data comprises a store number, a store name, a store longitude and latitude and a store address;
the matching module is used for matching the washed store data through a preset store matching algorithm to obtain a matching result;
and the verification module is used for verifying the matching result through a preset store verification algorithm, and the preset store verification algorithm comprises a DBSCAN clustering algorithm.
9. An out-store data cleansing matching apparatus comprising a memory, a processor and a computer program stored in said memory and executable on said processor, wherein said processor when executing said computer program implements the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN201911361361.8A 2019-12-24 2019-12-24 Store data cleaning and matching method, device, equipment and storage medium Active CN111125289B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911361361.8A CN111125289B (en) 2019-12-24 2019-12-24 Store data cleaning and matching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911361361.8A CN111125289B (en) 2019-12-24 2019-12-24 Store data cleaning and matching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111125289A true CN111125289A (en) 2020-05-08
CN111125289B CN111125289B (en) 2023-05-12

Family

ID=70502599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911361361.8A Active CN111125289B (en) 2019-12-24 2019-12-24 Store data cleaning and matching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111125289B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291099A (en) * 2020-05-13 2020-06-16 中邮消费金融有限公司 Address fuzzy matching method and system and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056503A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 Store monitoring evaluation method, device and storage medium
WO2019141072A1 (en) * 2018-01-22 2019-07-25 阿里巴巴集团控股有限公司 Method, device, and client for recommending store information
CN110188762A (en) * 2019-04-23 2019-08-30 山东大学 Chinese and English mixing merchant store fronts title recognition methods, system, equipment and medium
CN110223050A (en) * 2019-06-24 2019-09-10 广东工业大学 A kind of verification method and relevant apparatus of merchant store fronts title

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056503A1 (en) * 2017-09-25 2019-03-28 平安科技(深圳)有限公司 Store monitoring evaluation method, device and storage medium
WO2019141072A1 (en) * 2018-01-22 2019-07-25 阿里巴巴集团控股有限公司 Method, device, and client for recommending store information
CN110188762A (en) * 2019-04-23 2019-08-30 山东大学 Chinese and English mixing merchant store fronts title recognition methods, system, equipment and medium
CN110223050A (en) * 2019-06-24 2019-09-10 广东工业大学 A kind of verification method and relevant apparatus of merchant store fronts title

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阳旺;何国超;吴雁;: "基于密度聚类构建物流配送问题的毁灭移除算法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291099A (en) * 2020-05-13 2020-06-16 中邮消费金融有限公司 Address fuzzy matching method and system and computer equipment
CN111291099B (en) * 2020-05-13 2020-08-14 中邮消费金融有限公司 Address fuzzy matching method and system and computer equipment

Also Published As

Publication number Publication date
CN111125289B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
US10332389B2 (en) Extrapolating speed limits within road graphs
CN114168608B (en) Data processing system for updating knowledge graph
CN110717010B (en) Text processing method and system
CN108180922B (en) Navigation time evaluation method, device, equipment and medium
CN109243173B (en) Vehicle track analysis method and system based on road high-definition checkpoint data
WO2020098315A1 (en) Information matching method and terminal
CN113918733B (en) Data processing system for acquiring target knowledge graph
CN109165326A (en) A kind of character string matching method and device
US8582554B2 (en) Similarity searching in large disk-based networks
Li et al. Spatio-temporal trajectory simplification for inferring travel paths
CN108959359B (en) Uniform Resource Locator (URL) semantic deduplication method, device, equipment and medium
Zhu et al. Transportation routing map abstraction approach: Algorithm and numerical analysis
CN112269883A (en) Personnel information query method and device, electronic equipment and storage medium
CN111125289B (en) Store data cleaning and matching method, device, equipment and storage medium
CN113553399B (en) Text search method and system based on fuzzy language approximate concept lattice
CN106997369A (en) Data clearing method and device
US8700542B2 (en) Rule set management
CN115203061B (en) Interface automation test method and device, electronic equipment and storage medium
CN116418705A (en) Network asset identification method, system, terminal and medium based on machine learning
CN115114494A (en) Freespace edge point processing method and device
CN108614811B (en) Data analysis method and device
TW202146850A (en) Processing apparatus and method for determining road names
CN114255755A (en) Voice interaction method, vehicle, server, voice system and storage medium
CN114898559A (en) Method for measuring moving perception capability of urban vehicle
CN110609874B (en) Address entity coreference resolution method based on density clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant