TWI847497B

TWI847497B - Store deduplication processing method, device, equipment and storage medium

Info

Publication number: TWI847497B
Application number: TW112101633A
Authority: TW
Inventors: 余瑋琦; 佘蕭寒; 曾澤華; 姜華; 高鵬飛; 萬四爽; 劉藍
Original assignee: 大陸商中國銀聯股份有限公司
Priority date: 2022-08-10
Filing date: 2023-01-13
Publication date: 2024-07-01
Also published as: CN115392955B; TW202407602A; WO2024031943A1; CN115392955A

Abstract

本發明公開了一種門店去重處理方法、裝置、設備及存儲介質，屬於資料處理領域。該方法包括：獲取目標門店的第一門店名稱和第一門店位置資訊；根據第一門店位置資訊，確定目標門店所在的目標網格區域；在預存的存量門店資料庫中，獲取位於目標網格區域和鄰居網格區域的存量門店的第二門店名稱和第二門店位置資訊；基於第一門店名稱、第一門店位置資訊、第二門店名稱和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的目標相似度；在目標相似度大於等於預設的去重相似度閾值的情況下，將目標門店作為重複門店去除。根據本發明實施例能夠提高門店去重處理的效率。 The present invention discloses a store deduplication processing method, device, equipment and storage medium, belonging to the field of data processing. The method includes: obtaining a first store name and first store location information of a target store; determining a target grid area where the target store is located according to the first store location information; obtaining a second store name and second store location information of stock stores located in the target grid area and the neighboring grid area from a pre-stored stock store database; obtaining a target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information; and removing the target store as a duplicate store when the target similarity is greater than or equal to a preset deduplication similarity threshold. According to the embodiment of the present invention, the efficiency of store deduplication processing can be improved.

Description

Store deduplication processing method, device, equipment and storage medium

本發明屬於資料處理領域，尤其涉及一種門店去重處理方法、裝置、設備及存儲介質。 The present invention belongs to the field of data processing, and in particular relates to a store deduplication processing method, device, equipment and storage medium.

隨著電子支付技術的推廣，用戶在商戶線下的門店中可利用電子支付技術進行支付。為了便於處理商戶線下的門店中的電子支付，需要對商戶線下的門店進行資訊管理。但在門店資料由不同來源上送的情況下，不同來源可能會上送同一門店的門店資料，且不同來源上送的同一門店的門店資料可能會有所不同，導致根據門店資料將同一門店誤判為兩個不同的門店，即同一門店被反復統計。 With the promotion of electronic payment technology, users can use electronic payment technology to pay in offline stores of merchants. In order to facilitate the processing of electronic payments in offline stores of merchants, it is necessary to manage the information of offline stores of merchants. However, when store data is uploaded by different sources, different sources may upload store data of the same store, and the store data of the same store uploaded by different sources may be different, resulting in the same store being misjudged as two different stores based on the store data, that is, the same store is counted repeatedly.

為了避免同一門店被反復統計，需要派遣人員前往門店現場進行巡檢，人工判斷同一門店是否被反復統計。但人工巡檢花費時間、人力非常大，門店去重處理的效率很低。 In order to prevent the same store from being counted repeatedly, personnel need to be dispatched to the store site for on-site inspections to manually determine whether the same store is counted repeatedly. However, manual inspections are very time-consuming and labor-intensive, and the efficiency of store deduplication processing is very low.

本發明實施例提供一種門店去重處理方法、裝置、設備及存儲介質，能夠提高門店去重處理的效率。 The embodiment of the present invention provides a store deduplication processing method, device, equipment and storage medium, which can improve the efficiency of store deduplication processing.

第一方面，本發明實施例提供一種門店去重處理方法，包括：獲取目標門店的第一門店名稱和第一門店位置資訊；根據第一門店位置資訊，確定目標門店所在的目標網格區域；在預存的存量門店資料庫中，獲取位於目標網格區域和鄰居網格區域的存量門店的第二門店名稱和第二門店位置資訊，鄰居網格區域與目標網格區域相鄰；基於第一門店名稱、第一門店位置資訊、第二門店名稱和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的目標相似度；在目標相似度大於等於預設的去重相似度閾值的情況下，將目標門店作為重複門店去除。 In the first aspect, the embodiment of the present invention provides a method for deduplicating stores, including: obtaining a first store name and first store location information of a target store; determining a target grid area where the target store is located according to the first store location information; obtaining a second store name and second store location information of stock stores located in the target grid area and the neighboring grid area from a pre-stored stock store database, wherein the neighboring grid area is adjacent to the target grid area; obtaining a target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information; and removing the target store as a duplicate store when the target similarity is greater than or equal to a preset deduplication similarity threshold.

第二方面，本發明實施例提供一種門店去重處理裝置，包括：第一獲取模組，用於獲取目標門店的第一門店名稱和第一門店位置資訊；網格區域確定模組，用於根據第一門店位置資訊，確定目標門店所在的目標網格區域；第二獲取模組，用於在預存的存量門店資料庫中，獲取位於目標網格區域和鄰居網格區域的存量門店的第二門店名稱和第二門店位置資訊，鄰居網格區域與目標網格區域相鄰；計算模組，用於基於第一門店名稱、第一門店位置資訊、第二門店名稱和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的目標相似度；去重模組，用於在目標相似度大於等於預設的去重相似度閾值的情況下，將目標門店作為重複門店去除。 In a second aspect, an embodiment of the present invention provides a store deduplication processing device, comprising: a first acquisition module, used to obtain a first store name and first store location information of a target store; a grid area determination module, used to determine a target grid area where the target store is located according to the first store location information; a second acquisition module, used to obtain a second store name of the stock stores located in the target grid area and the neighboring grid area from a pre-stored stock store database and the location information of the second store, the neighboring grid area is adjacent to the target grid area; the calculation module is used to obtain the target similarity of the target store and the existing stores located in the target grid area and the neighboring grid area based on the name of the first store, the location information of the first store, the name of the second store and the location information of the second store; the deduplication module is used to remove the target store as a duplicate store when the target similarity is greater than or equal to the preset deduplication similarity threshold.

第三方面，本發明實施例提供一種門店去重處理設備，設備包括：處理器以及存儲有電腦程式指令的記憶體；處理器執行電腦程式指令時實現第一方面的門店去重處理方法。 In the third aspect, the embodiment of the present invention provides a store deduplication processing device, the device comprising: a processor and a memory storing computer program instructions; when the processor executes the computer program instructions, the store deduplication processing method of the first aspect is implemented.

第四方面，本發明實施例提供一種電腦可讀存儲介質，電腦可讀存儲介質上存儲有電腦程式指令，電腦程式指令被處理器執行時實現第一方面的門店去重處理方法。 In the fourth aspect, the embodiment of the present invention provides a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the store deduplication processing method of the first aspect is implemented.

本發明實施例提供一種門店去重處理方法、裝置、設備及存儲介質，可根據目標門店的門店位置資訊，確定目標門店所在的網格區域。網格區域為地圖中劃分的區域。基於資料庫中位於目標門店所在的目標網格區域的存量門店、目標網格區域周邊的網格區域的存量門店以及目標門店的門店名稱、門店位置資訊，得到目標門店與存量門店的相似度，根據該相似度判斷新獲取的門店是否與存量門店為同一門店，若新獲取的門店與存量門店為同一門店，則認為新獲取的門店為重複門店，予以去除。該去重過程不需人工參與，且利用門店的位置可縮小用於比對的存量門店的範圍，提高了門店去重處理的效率。 The embodiments of the present invention provide a method, device, equipment and storage medium for deduplication of stores, which can determine the grid area where the target store is located according to the store location information of the target store. The grid area is an area divided in a map. Based on the existing stores in the target grid area where the target store is located, the existing stores in the grid area surrounding the target grid area, and the store name and store location information of the target store in the database, the similarity between the target store and the existing stores is obtained, and based on the similarity, it is determined whether the newly acquired store is the same store as the existing store. If the newly acquired store is the same store as the existing store, the newly acquired store is considered to be a duplicate store and is removed. The deduplication process does not require manual participation, and the store locations can be used to narrow the scope of existing stores for comparison, improving the efficiency of store deduplication processing.

21:存量門店 21: Existing stores

300,400:門店去重處理裝置 300,400: Store deduplication processing device

301:第一獲取模組 301: First acquisition module

302:網格區域確定模組 302: Grid area determination module

303:第二獲取模組 303: Second acquisition module

304:計算模組 304: Computing module

305:去重模組 305: Deduplication module

401:記憶體 401: Memory

402:處理器 402:Processor

403:通信介面 403: Communication interface

404:匯流排 404:Bus

A1,A2,A3,A4,A5,A6,A7,A8,A9:網格區域 A1,A2,A3,A4,A5,A6,A7,A8,A9: Grid area

S101,S102,S103,S104,S1041,S1042,S1043,S105,S106,S107,S108,S109,S110,S111,S112,S113,S114,S115:步驟 S101,S102,S103,S104,S1041,S1042,S1043,S105,S106,S107,S108,S109,S110,S111,S112,S113,S114,S115: Steps

為了更清楚地說明本發明實施例的技術方案，下面將對本發明實施例中所需要使用的圖式作簡單的介紹，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些圖式獲得其他的圖式。 In order to more clearly explain the technical solution of the embodiment of the present invention, the following will briefly introduce the diagrams required for use in the embodiment of the present invention. For ordinary technicians in this field, other diagrams can be obtained based on these diagrams without creative labor.

圖1為本發明一實施例提供的門店去重處理方法的流程圖；圖2為本發明實施例中網格區域的一示例的示意圖；圖3為本發明另一實施例提供的門店去重處理方法的流程圖；圖4為本發明實施例中編碼表的一示例的示意圖；圖5為本發明又一實施例提供的門店去重處理方法的流程圖；圖6為本發明一實施例提供的門店去重處理裝置的結構示意圖；圖7為本發明一實施例提供的門店去重處理設備的結構示意圖。 FIG1 is a flow chart of a store deduplication processing method provided by an embodiment of the present invention; FIG2 is a schematic diagram of an example of a grid area in an embodiment of the present invention; FIG3 is a flow chart of a store deduplication processing method provided by another embodiment of the present invention; FIG4 is a schematic diagram of an example of a coding table in an embodiment of the present invention; FIG5 is a flow chart of a store deduplication processing method provided by another embodiment of the present invention; FIG6 is a structural schematic diagram of a store deduplication processing device provided by an embodiment of the present invention; FIG7 is a structural schematic diagram of a store deduplication processing device provided by an embodiment of the present invention.

下面將詳細描述本發明的各個方面的特徵和示例性實施例，為了使本發明的目的、技術方案及優點更加清楚明白，以下結合圖式及具體實施例，對本發明進行進一步詳細描述。應理解，此處所描述的具體實施例僅意在解釋本發明，而不是限定本發明。對於本領域技術人員來說，本發明可以在不需要這些具體細節中的一些細節的情況下實施。下面對實施例的描述僅僅是為了通過示出本發明的示例來提供對本發明更好的理解。 The features and exemplary embodiments of various aspects of the present invention will be described in detail below. In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in combination with drawings and specific embodiments. It should be understood that the specific embodiments described here are only intended to explain the present invention, not to limit the present invention. For those skilled in the art, the present invention can be implemented without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.

隨著電子支付技術的推廣，用戶在商戶線下的門店中可利用電子支付技術進行支付。為了便於處理商戶線下的門店中的電子支付，需要對商戶線下的門店進行資訊管理。但在門店資料由不同來源上送的情況下，不同來源可能會上送同一門店的門店資料，且不同來源上送的同一門店的門店資料可能會有所不同，導致根據門店資料將同一門店誤判為兩個不同的門店，即同一門店被反復統計。在一些情況下，同一門店被反復統計的情況也可能會被利用，從而造成存儲門店資訊的資料庫中的漏洞。 With the promotion of electronic payment technology, users can use electronic payment technology to make payments in offline stores. In order to facilitate the processing of electronic payments in offline stores, it is necessary to manage the information of offline stores. However, when store data is uploaded by different sources, different sources may upload store data of the same store, and the store data of the same store uploaded by different sources may be different, resulting in the same store being misjudged as two different stores based on the store data, that is, the same store is repeatedly counted. In some cases, the situation where the same store is repeatedly counted may also be exploited, thereby causing loopholes in the database storing store information.

本發明提供一種門店去重處理方法、裝置、設備及存儲介質，可根據新獲取的門店的門店位置資訊，確定該門店所在的網格區域。網格區域為地圖中劃分的區域。利用資料庫中位於新獲取的門店所在的目標網格區域和目標網格區域周邊的網格區域的存量門店的資料，以及新獲取的門店的資料，得到新獲取的門店與存量門店的相似度，根據該相似度判斷新獲取的門店是否與存量門店為同一門店，若新獲取的門店與存量門店為同一門店，則認為新獲取的門店為重複門店，予以去除。該去重過程不需人工參與，且利用門店的位置縮小用於比對的存量門店的範圍，提高了門店去重處理的效率。 The present invention provides a store deduplication processing method, device, equipment and storage medium, which can determine the grid area where the store is located according to the store location information of the newly acquired store. The grid area is the area divided in the map. The data of the existing stores in the target grid area where the newly acquired store is located and the grid area around the target grid area, as well as the data of the newly acquired store, are used in the database to obtain the similarity between the newly acquired store and the existing store. Based on the similarity, it is judged whether the newly acquired store is the same store as the existing store. If the newly acquired store is the same store as the existing store, the newly acquired store is considered to be a duplicate store and is removed. The deduplication process does not require manual participation, and uses the store location to narrow the range of existing stores for comparison, thereby improving the efficiency of store deduplication processing.

下面對本發明提供的門店去重處理方法、裝置、設備及存儲介質分別進行說明。 The following is a description of the store deduplication processing method, device, equipment and storage medium provided by the present invention.

本發明第一方面提供一種門店去重處理方法，可應用於根據不同來源收集來的門店資訊進行門店去重的場景，可由門店去重裝置、設備等執行，在此並不限定。圖1為本發明一實施例提供的門店去重處理方法的流程圖，如圖1所示，門店去重處理方法可包括步驟S101至步驟S105。 The first aspect of the present invention provides a store deduplication processing method, which can be applied to the scene of store deduplication based on store information collected from different sources, and can be executed by store deduplication devices, equipment, etc., which is not limited here. Figure 1 is a flow chart of the store deduplication processing method provided by an embodiment of the present invention. As shown in Figure 1, the store deduplication processing method may include steps S101 to S105.

在步驟S101中，獲取目標門店的第一門店名稱和第一門店位置資訊。 In step S101, the first store name and first store location information of the target store are obtained.

目標門店為待判斷是否為重複門店的門店，可以為新獲取到的門店資訊對應的門店，如新的欲加入存量門店資料庫中的門店。第一門店名稱可為目標門店的門店名稱。第一門店位置資訊可為目標門店的門店位置資訊。門店位置資訊用於表徵門店的位置，可包括門店地址、門店經緯度等，在此並不限定。 The target store is the store to be determined whether it is a duplicate store, and can be the store corresponding to the newly acquired store information, such as a new store to be added to the existing store database. The first store name can be the store name of the target store. The first store location information can be the store location information of the target store. The store location information is used to characterize the location of the store, and can include the store address, store latitude and longitude, etc., which are not limited here.

在步驟S102中，根據第一門店位置資訊，確定目標門店所在的目標網格區域。 In step S102, the target grid area where the target store is located is determined based on the first store location information.

為了便於處理，可預先將地圖劃分為多個網格區域。不同網格區域的大小可以相同，也可不同，在此並不限定。網格區域的形狀可為矩形等規則形狀，也可為不規則形狀，在此並不限定。例如，網格區域可為長為150米，寬為150米的矩形區域。 For ease of processing, the map can be pre-divided into multiple grid areas. The sizes of different grid areas can be the same or different, which is not limited here. The shape of the grid area can be a regular shape such as a rectangle, or an irregular shape, which is not limited here. For example, the grid area can be a rectangular area with a length of 150 meters and a width of 150 meters.

目標網格區域為目標門店所在的網格區域。第一門店位置資訊可表徵目標門店的位置，根據第一門店位置資訊可確定目標門店所在的網格區域即目標網格區域。 The target grid area is the grid area where the target store is located. The first store location information can represent the location of the target store, and the grid area where the target store is located can be determined based on the first store location information, i.e. the target grid area.

在步驟S103中，在預存的存量門店資料庫中，獲取位於目標網格區域和鄰居網格區域的存量門店的第二門店名稱和第二門店位置資訊。 In step S103, the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area are obtained from the pre-stored stock store database.

存量門店資料庫包括存量門店的相關資料。存量門店為已確定為非重複門店的門店。存量門店的相關資料可包括但不限於存量門店的門店名稱、門店位置資訊、所在網格區域等。 The existing store database includes relevant data of existing stores. Existing stores are stores that have been determined to be non-duplicate stores. The relevant data of existing stores may include but is not limited to the store name, store location information, grid area, etc. of the existing stores.

為了縮小與目標門店比對的存量門店的範圍，可根據位置先行劃定一個可能存在與目標門店為同一門店的存量門店的地理區域，該地理區域為目標門店的位置的周邊區域。可將目標網格區域和鄰居網格區域確定為目標門店的位置的周邊區域。鄰居網格區域與目標網格區域相鄰，即，鄰居網格區域為與目標網格區域相鄰的網格區域。 In order to narrow the scope of the stock stores to be compared with the target store, a geographical area where the stock stores that are the same as the target store may be located can be first delineated according to the location. The geographical area is the surrounding area of the location of the target store. The target grid area and the neighboring grid area can be determined as the surrounding area of the location of the target store. The neighboring grid area is adjacent to the target grid area, that is, the neighboring grid area is the grid area adjacent to the target grid area.

例如，圖2為本發明實施例中網格區域的一示例的示意圖。圖2中以虛線方格示出了9個網格區域，分別為網格區域A1至A9。圖2還示出了多個存量門店21。若網格區域A5為目標網格區域，對應地，網格區域A1、網格區域A2、網格區域A3、網格區域A4、網格區域A6、網格區域A7、網格區域A8和網格區域A9均為目標網格區域的鄰居網格區域。以對位於網格區域A5中的目標門店進行去重處理為例，可獲取網格區域A1中各存量門店21的門店名稱和門店位置資訊、網格區域A2中各存量門店21的門店名稱和門店位置資訊、網格區域A3中各存量門店21的門店名稱和門店位置資訊、網格區域A4中各存量門店21的門店名稱和門店位置資訊、網格區域A6中各存量門店21的門店名稱和門店位置資訊、網格區域A7中各存量門店21的門店名稱和門店位置資訊、網格區域A8中各存量門店21的門店名稱和門店位置資訊以及網格區域A9中各存量門店21的門店名稱和門店位置資訊。 For example, FIG2 is a schematic diagram of an example of a grid area in an embodiment of the present invention. FIG2 shows nine grid areas in dashed squares, namely grid areas A1 to A9. FIG2 also shows a plurality of existing stores 21. If grid area A5 is the target grid area, correspondingly, grid area A1, grid area A2, grid area A3, grid area A4, grid area A6, grid area A7, grid area A8 and grid area A9 are all neighboring grid areas of the target grid area. Taking the deduplication process for the target store located in the grid area A5 as an example, the store name and store location information of each stock store 21 in the grid area A1, the store name and store location information of each stock store 21 in the grid area A2, the store name and store location information of each stock store 21 in the grid area A3, the store name and store location information of each stock store 21 in the grid area A4, the store name and store location information of each stock store 21 in the grid area A6, the store name and store location information of each stock store 21 in the grid area A7, the store name and store location information of each stock store 21 in the grid area A8, and the store name and store location information of each stock store 21 in the grid area A9 can be obtained.

存量門店資料庫中存量門店的數量級很大，若將目標門店與存量門店資料庫中所有存量門店一一比對，會使得門店去重處理所需時間較長。由於目標網格區域和鄰居網格區域為目標門店的周邊區域，位於目標門店的周邊區域中的存量門店和目標門店為同一門店的可能性較大，可先將存量門店資料庫中位於目標網格區域的和鄰居網格區域的存量門店的相關資料篩選出來，利用位於目標網格區域的和鄰居網格區域的存量門店的相關資料和目標門店的相關資料，來進行存量門店與目標門店的比對，以縮短門店去重處理所需時間，提高門店去重處理的效率。 The number of existing stores in the existing store database is very large. If the target store is compared with all the existing stores in the existing store database one by one, it will take a long time to perform store deduplication processing. Since the target grid area and the neighboring grid area are the surrounding areas of the target store, the existing stores in the surrounding areas of the target store are more likely to be the same store as the target store. The relevant data of the existing stores in the target grid area and the neighboring grid area in the existing store database can be first screened out, and the relevant data of the existing stores in the target grid area and the neighboring grid area and the relevant data of the target store can be used to compare the existing stores with the target store, so as to shorten the time required for store deduplication processing and improve the efficiency of store deduplication processing.

位於目標網格區域和鄰居網格區域的存量門店包括位於目標網格區域的存量門店和位於鄰居網格區域的存量門店。第二門店名稱包括位於目標網格區域的存量門店的門店名稱和位於鄰居網格區域的存量門店的門店名稱。第二門店位置資訊包括位於目標網格區域的存量門店的門店位置資訊和位於鄰居網格區域的存量門店的門店位置資訊。 The existing stores located in the target grid area and the neighboring grid area include the existing stores located in the target grid area and the existing stores located in the neighboring grid area. The second store name includes the store name of the existing store located in the target grid area and the store name of the existing store located in the neighboring grid area. The second store location information includes the store location information of the existing stores located in the target grid area and the store location information of the existing stores located in the neighboring grid area.

在步驟S104中，基於第一門店名稱、第一門店位置資訊、第二門店名稱和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的目標相似度。 In step S104, based on the first store name, the first store location information, the second store name and the second store location information, the target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area is obtained.

基於第一門店名稱和第二門店名稱，可得到目標門店與存量門店在門店名稱方面的相似度。基於第一門店位置資訊和第二門店位置資訊，可得到目標門店與存量門店在地理位置方面的相似度。根據目標門店與存量門店在門店名稱方面的相似度和在地理位置方面的相似度，可得到目標相似度。目標相似度為目標門店與存量門店的相似度。可計算得到目標門店與位於目標網格區域和鄰居網格區域的每個存量門店的相似度，根據目標相似度，確定目標門店是否為與位於目標網格區域和鄰居網格區域的存量門店相同的重複門店。 Based on the first store name and the second store name, the similarity between the target store and the existing stores in terms of store name can be obtained. Based on the first store location information and the second store location information, the similarity between the target store and the existing stores in terms of geographical location can be obtained. According to the similarity between the target store and the existing stores in terms of store name and geographical location, the target similarity can be obtained. The target similarity is the similarity between the target store and the existing stores. The similarity between the target store and each existing store located in the target grid area and the neighboring grid area can be calculated, and according to the target similarity, it is determined whether the target store is a duplicate store that is the same as the existing stores located in the target grid area and the neighboring grid area.

在步驟S105中，在目標相似度大於等於預設的去重相似度閾值的情況下，將目標門店作為重複門店去除。 In step S105, when the target similarity is greater than or equal to the preset deduplication similarity threshold, the target store is removed as a duplicate store.

去重相似度閾值為確認目標門店與存量門店為同一門店的相似度的閾值，可根據場景、需求、經驗等設定，在此並不限定，例如，去重相似度閾值可為0.6。目標相似度大於等於去重相似度閾值，表示目標門店與存量門店為同一門店，即目標門店為重複門店，可將目標門店去除。將目標門店去除可指捨棄目標門店的相關資料。目標相似度小於去重相似度閾值，表示目標門店與存量門店為不同的門店，即目標門店不是重複門店，可將目標門店的相關資料存儲入存量門店資料庫，也就是說，可將目標門店視為新加入存量門店資料庫中的存量門店。 The deduplication similarity threshold is a threshold for confirming that the target store and the existing store are the same store. It can be set according to the scenario, demand, experience, etc., and is not limited here. For example, the deduplication similarity threshold can be 0.6. If the target similarity is greater than or equal to the deduplication similarity threshold, it means that the target store and the existing store are the same store, that is, the target store is a duplicate store, and the target store can be removed. Removing the target store can mean discarding the relevant data of the target store. If the target similarity is less than the deduplication similarity threshold, it means that the target store and the existing store are different stores, that is, the target store is not a duplicate store, and the relevant data of the target store can be stored in the existing store database, that is, the target store can be regarded as an existing store newly added to the existing store database.

在本發明實施例中，可根據目標門店的門店位置資訊，確定目標門店所在的網格區域。網格區域為地圖中劃分的區域。基於資料庫中位於目標門店所在的目標網格區域的存量門店、目標網格區域周邊的網格區域的存量門店以及目標門店的門店名稱、門店位置資訊，得到目標門店與存量門店的相似度，根據該相似度判斷新獲取的門店是否與存量門店為同一門店，若新獲取的門店與存量門店為同一門店，則認為新獲取的門店為重複門店，予以去除。該去重過程不需人工參與，且利用門店的位置可縮小用於比對的存量門店的範圍，提高了門店去重處理的效率。 In the embodiment of the present invention, the grid area where the target store is located can be determined based on the store location information of the target store. The grid area is the area divided in the map. Based on the existing stores in the target grid area where the target store is located, the existing stores in the grid area around the target grid area, and the store name and store location information of the target store in the database, the similarity between the target store and the existing stores is obtained, and the newly acquired store is judged whether it is the same store as the existing store based on the similarity. If the newly acquired store is the same store as the existing store, the newly acquired store is considered to be a duplicate store and is removed. The deduplication process does not require manual participation, and the location of the store can be used to narrow the range of the existing stores used for comparison, thereby improving the efficiency of store deduplication processing.

而且，除了比對目標門店與目標網格區域中的存量門店以外，還比對目標門店與鄰居網格區域中的存量門店，避免漏查位於目標網格區域的邊界附近與目標門店為同一門店的存量門店，進一步提高門店去重處理的全面性和準確性。 Moreover, in addition to comparing the target store with the existing stores in the target grid area, the target store is also compared with the existing stores in the neighboring grid area to avoid missing the existing stores that are located near the boundary of the target grid area and are the same store as the target store, further improving the comprehensiveness and accuracy of store deduplication processing.

在一些實施例中，網格區域具有網格編碼，可基於目標網格區域的網格編碼和網格編碼演算法，確定目標網格區域的鄰居網格區域。圖3為本發明另一實施例提供的門店去重處理方法的流程圖。圖3與圖1的不同之處在於，圖3所示的門店去重處理方法還可包括步驟S106至步驟S108，圖3所示的門店去重處理方法還可包括步驟S109至步驟S112，或步驟S113至步驟S115。 In some embodiments, the grid area has a grid code, and the neighboring grid area of the target grid area can be determined based on the grid code of the target grid area and the grid coding algorithm. FIG. 3 is a flow chart of a store deduplication processing method provided by another embodiment of the present invention. FIG. 3 is different from FIG. 1 in that the store deduplication processing method shown in FIG. 3 may further include steps S106 to S108, and the store deduplication processing method shown in FIG. 3 may further include steps S109 to S112, or steps S113 to S115.

在步驟S106中，將地圖劃分為多個網格區域，並利用網格編碼演算法，為每個網格區域分配網格編碼。 In step S106, the map is divided into a plurality of grid regions, and a grid code is assigned to each grid region using a grid coding algorithm.

可獲取地理地圖，將地理地圖劃分為多個網格區域。為每個網格區域分配一個網格編碼，網格編碼可表徵網格區域，即，不同的網格區域的網格編碼不同。網格編碼可根據網格編碼演算法得到，在此並不限定網格編碼演算法的類型。根據同一網格區域中不同位置的位置資訊計算得到網格編碼相同。 A geographic map can be obtained and divided into a plurality of grid regions. A grid code is assigned to each grid region, and the grid code can characterize the grid region, that is, different grid regions have different grid codes. The grid code can be obtained according to a grid coding algorithm, and the type of the grid coding algorithm is not limited here. The grid codes calculated according to the position information of different locations in the same grid region are the same.

在一些示例中，網格編碼可為m位元字串，網格編碼中的前m1位元的字元可表徵省、市、區等，鄰近的多個網格區域的前m1位元的字元一致，後m-m1位元的字元不同。不同網格區域的網格編碼的後m-m1位元的字元可按照預設的編碼表選取，編碼表包括多個按一定順序排布的編碼字元，可按照編碼字元的排布順序與網格區域的對應關係，選擇對應的編碼字元作為網格編碼的後m-m1位元的字元。網格編碼的後m-m1位元中每一位元可對應一張編碼表，不同位元對應的編碼表可以相同，也可以不同。根據多個網格區域的網格編碼，可確定多個網格區域是否鄰近，進一步地，還可根據多個網格區域的網格編碼，確定網格區域之間的方位關係。 In some examples, the grid code may be an m-bit string, and the characters of the first m1 bits in the grid code may represent a province, city, district, etc. The characters of the first m1 bits of multiple adjacent grid areas are consistent, and the characters of the last m-m1 bits are different. The characters of the last m-m1 bits of the grid codes of different grid areas may be selected according to a preset coding table. The coding table includes multiple coding characters arranged in a certain order. According to the correspondence between the arrangement order of the coding characters and the grid area, the corresponding coding characters may be selected as the characters of the last m-m1 bits of the grid code. Each bit in the last m-m1 bits of the grid code may correspond to a coding table, and the coding tables corresponding to different bits may be the same or different. According to the grid codes of multiple grid areas, it can be determined whether the multiple grid areas are adjacent. Furthermore, according to the grid codes of multiple grid areas, the orientation relationship between the grid areas can be determined.

例如，圖4為本發明實施例中編碼表的一示例的示意圖。網格區域如圖2所示，網格編碼為7位元字串，若鄰近的網格區域的網格編碼中前6位元的字元一致，均為wk2vu1，最後一位元的字元按照圖4所示的編碼表進行編碼，網格區域A1的網格編碼為wk2vu1E，則網格區域A2的網格編碼為wk2vu1R，網格區域A3的網格編碼為wk2vu1T，網格區域A4的網格編碼為wk2vu1D，網格區域A5的網格編碼為wk2vu1F，網格區域A6的網格編碼為wk2vu1G，網格區域A7的網格編碼為wk2vu1C，網格區域A8的網格編碼為wk2vu1V，網格區域A9的網格編碼為wk2vu1B。 For example, FIG4 is a schematic diagram of an example of a coding table in an embodiment of the present invention. The grid area is shown in FIG2, and the grid code is a 7-bit string. If the characters of the first 6 bits in the grid codes of the adjacent grid areas are the same, both are wk2vu1, and the characters of the last bit are encoded according to the coding table shown in FIG4, the grid code of the grid area A1 is wk2vu1E, the grid code of the grid area A2 is wk2vu1R, and the grid code of the grid area A3 is wk2vu1E. The grid code of grid area A4 is wk2vu1D, the grid code of grid area A5 is wk2vu1F, the grid code of grid area A6 is wk2vu1G, the grid code of grid area A7 is wk2vu1C, the grid code of grid area A8 is wk2vu1V, and the grid code of grid area A9 is wk2vu1B.

在步驟S107中，獲取存量門店的門店位置資訊，根據存量門店的門店位置資訊，確定存量門店所在的網格區域。 In step S107, the store location information of the existing stores is obtained, and the grid area where the existing stores are located is determined based on the store location information of the existing stores.

在步驟S108中，建立存量門店和存量門店所在的網格區域的網格編碼的第一對應關係，並將第一對應關係存儲於存量門店資料庫。 In step S108, a first correspondence between the existing stores and the grid codes of the grid areas where the existing stores are located is established, and the first correspondence is stored in the existing store database.

第一對應關係包括存量門店和存量門店所在的網格區域的網格編碼的對應關係。為了進一步縮短門店去重處理所需的時間，可預先對存量門店的資料進行處理，將得到的存量門店所在的網格區域的網格編碼與存量門店建立對應關係，並將該對應關係存儲入存量門店資料庫，以便於在門店去重處理過程中可在存量門店資料庫中直接查找到目標網格區域的網格編碼對應的存量門店以及鄰居網格區域的網格編碼對應的存量門店，目標網格區域的網格編碼對應的存量門店為位於目標網格區域的存量門店，鄰居網格區域的網格編碼對應的存量門店為位於鄰居網格區域的存量門店。 The first correspondence includes the correspondence between the existing stores and the grid codes of the grid areas where the existing stores are located. In order to further shorten the time required for store deduplication processing, the data of existing stores can be processed in advance, and the grid codes of the grid areas where the existing stores are located can be mapped to the existing stores, and the mapping relationship can be stored in the existing store database, so that in the process of store deduplication processing, the existing stores corresponding to the grid codes of the target grid area and the existing stores corresponding to the grid codes of the neighboring grid areas can be directly found in the existing store database. The existing stores corresponding to the grid codes of the target grid area are the existing stores located in the target grid area, and the existing stores corresponding to the grid codes of the neighboring grid areas are the existing stores located in the neighboring grid areas.

在步驟S109中，獲取目標網格區域的網格編碼。 In step S109, the grid code of the target grid area is obtained.

確定目標網格區域後，可獲取目標網格區域的網格編碼。 After determining the target grid area, the grid code of the target grid area can be obtained.

在步驟S110中，根據目標網格區域的網格編碼和網格編碼逆演算法，獲取目標網格區域的頂點的位置。 In step S110, the position of the vertex of the target grid area is obtained according to the grid coding of the target grid area and the grid coding inverse algorithm.

網格編碼逆演算法為網格編碼演算法的逆演算法。根據網格區域中一個或多個位置的位置資訊，利用網格編碼演算法，可得到該網格區域的網格編碼。根據網格區域的網格編碼，利用網格編碼逆演算法，可得到該網格區域的頂點的位置資訊。 The grid coding inverse algorithm is the inverse algorithm of the grid coding algorithm. Based on the location information of one or more locations in the grid area, the grid coding of the grid area can be obtained by using the grid coding algorithm. Based on the grid coding of the grid area, the location information of the vertices of the grid area can be obtained by using the grid coding inverse algorithm.

在步驟S111中，根據目標網格區域的頂點的位置資訊，確定位於鄰居網格區域中輔助點的位置資訊。 In step S111, the location information of the auxiliary point in the neighboring grid area is determined based on the location information of the vertex of the target grid area.

鄰居網格區域與目標網格區域共用部分頂點，得到目標網格區域的頂點的位置資訊，相當於得到鄰居網格區域的部分頂點的位置資訊，根據鄰居網格區域的部分頂點的位置資訊，可得到鄰居網格區域中輔助點的位置資訊。輔助點可為鄰居網格區域中除與目標網格區域共用的頂點外的任意一點或多點，在此並不限定。可在每個鄰居網格區域中確定輔助點，以便於後續利用輔助點的位置資訊，確定鄰居網格區域。 The neighboring grid area shares some vertices with the target grid area, and obtaining the location information of the vertices of the target grid area is equivalent to obtaining the location information of some vertices of the neighboring grid area. According to the location information of some vertices of the neighboring grid area, the location information of the auxiliary points in the neighboring grid area can be obtained. The auxiliary points can be any one or more points in the neighboring grid area except the vertices shared with the target grid area, and are not limited here. The auxiliary points can be determined in each neighboring grid area, so that the location information of the auxiliary points can be used to determine the neighboring grid area later.

在步驟S112中，基於每個鄰居網格區域中輔助點的位置資訊和網格編碼演算法，計算得到每個鄰居網格區域的網格編碼，以確定鄰居網格區域。 In step S112, based on the location information of the auxiliary points in each neighboring grid area and the grid coding algorithm, the grid code of each neighboring grid area is calculated to determine the neighboring grid area.

網格編碼與網格區域具有對應關係，根據鄰居網格區域中輔助點的位置資訊，利用網格編碼演算法，計算得到的網格編碼為鄰居網格區域的網格編碼。利用網格編碼與網格區域的對應關係，可確定鄰居網格區域。 The grid code and the grid area have a corresponding relationship. According to the location information of the auxiliary point in the neighboring grid area, the grid code calculated by the grid coding algorithm is the grid code of the neighboring grid area. The neighboring grid area can be determined by using the corresponding relationship between the grid code and the grid area.

在步驟S113中，獲取目標網格區域的網格編碼。 In step S113, the grid code of the target grid area is obtained.

在步驟S114中，根據目標網格區域的網格編碼，獲取候選網格區域的網格編碼。 In step S114, the grid code of the candidate grid area is obtained according to the grid code of the target grid area.

在一些示例中，鄰近的網格區域的網格編碼的一部分數位的字元是相同的，可利用該特徵在大量的網格區域中篩選出目標網格區域鄰近的網格區域即候選網格區域。候選網格區域包括網格編碼中一部分數位的字元與目標網格區域的網格編碼中一部分數位的字元相同的網格區域。例如，鄰近的網格區域的網格編碼的前m1個數位的字元相同，可將網格編碼的前m1個數位的字元與目標網格區域的網格編碼的前m1個數位的字元相同的網格區域確定為候選網格區域。 In some examples, the characters of a portion of the digits in the grid codes of the adjacent grid regions are the same, and this feature can be used to screen out the grid regions adjacent to the target grid region from a large number of grid regions, namely the candidate grid regions. The candidate grid regions include grid regions where the characters of a portion of the digits in the grid codes are the same as the characters of a portion of the digits in the grid codes of the target grid regions. For example, the characters of the first m1 digits in the grid codes of the adjacent grid regions are the same, and the grid regions where the characters of the first m1 digits in the grid codes are the same as the characters of the first m1 digits in the grid codes of the target grid regions can be determined as candidate grid regions.

在步驟S115中，按照網格編碼演算法中的網格區域排布與編碼數位的字元的對應關係，在候選網格區域的網格編碼中確定鄰居網格區域的網格編碼，以確定鄰居網格區域。 In step S115, according to the correspondence between the grid area arrangement and the characters of the coded digits in the grid coding algorithm, the grid code of the neighboring grid area is determined in the grid code of the candidate grid area to determine the neighboring grid area.

網格編碼演算法中可包括網格區域排布與編碼數位的字元的對應關係。例如，網格區域的排布如圖2所示，網格編碼為7位元的字串，候選網格區域的網格編碼的前6位元的字元與目標網格區域的網格編碼的前6位元的字元相同，目標網格區域為網格區域A5，其網格編碼為wk2vu1D，網格編碼演算法中網格區域排布與網格編碼的最後一位元的字元的對應關係具體實現為如圖4所示的編碼表，則可知目標網格區域具有8個鄰居網格區域，8個鄰居網格區域分別位於目標網格區域的左上、上、右上、左、右、左下、下、右下，按照圖4所示的編碼表，位於字元D的左上、上、右上、左、右、左下、下、右下的字元分別為W、E、R、S、 F、X、C、V，對應地，位於目標網格區域的左上、上、右上、左、右、左下、下、右下的8個鄰居網格區域，即網格區域A1、網格區域A2、網格區域A3、網格區域A4、網格區域A6、網格區域A7、網格區域A8、網格區域A9的網格編碼分別為wk2vu1W、wk2vu1E、wk2vu1R、wk2vu1S、wk2vu1F、wk2vu1X、wk2vu1C、wk2vu1V。 The grid coding algorithm may include a correspondence between the grid area arrangement and the characters of the coded digits. For example, the grid area arrangement is shown in FIG2 , the grid code is a 7-bit string, the first 6 bits of the grid code of the candidate grid area are the same as the first 6 bits of the grid code of the target grid area, the target grid area is grid area A5, and its grid code is wk2vu1D. The correspondence between the grid area arrangement and the last bit of the grid code in the grid coding algorithm is specifically implemented as a coding table as shown in FIG4 . It can be seen that the target grid area has 8 neighboring grid areas, and the 8 neighboring grid areas are located at the upper left, upper, upper right, left, right, lower left, lower, and lower right of the target grid area, respectively. According to the coding table shown in FIG4 , the position The characters at the upper left, upper right, left, right, lower left, lower right, and lower right of the character D are W, E, R, S, F, X, C, and V. Correspondingly, the grid codes of the eight neighboring grid areas located at the upper left, upper right, left, right, lower left, lower, and lower right of the target grid area, namely, grid area A1, grid area A2, grid area A3, grid area A4, grid area A6, grid area A7, grid area A8, and grid area A9, are wk2vu1W, wk2vu1E, wk2vu1R, wk2vu1S, wk2vu1F, wk2vu1X, wk2vu1C, and wk2vu1V, respectively.

網格編碼表徵網格區域，確定鄰居網格區域的網格編碼，即可確定鄰居網格區域。 The grid code represents the grid area. By determining the grid code of the neighboring grid area, the neighboring grid area can be determined.

利用網格區域排布與編碼數位的字元的對應關係來確定鄰居網格區域的方式更為簡便，耗時更短，效率更高。 The method of using the correspondence between the grid area arrangement and the characters of the coded digits to determine the neighboring grid area is simpler, less time-consuming and more efficient.

在一些實施例中，目標相似度可基於及門店名稱相關的相似度、及門店位置資訊相關的相似度綜合得到。圖5為本發明又一實施例提供的門店去重處理方法的流程圖。圖5與圖1的不同之處在於，圖1中的步驟S104可具體細化為圖5中的步驟S1041至步驟S1043。 In some embodiments, the target similarity can be obtained based on the similarity related to the store name and the similarity related to the store location information. FIG5 is a flow chart of a store deduplication processing method provided by another embodiment of the present invention. FIG5 is different from FIG1 in that step S104 in FIG1 can be specifically refined into steps S1041 to S1043 in FIG5.

在步驟S1041中，基於第一門店名稱和第二門店名稱，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的N個名稱相關相似度。 In step S1041, based on the first store name and the second store name, the similarities between the target store and the N existing stores located in the target grid area and the neighboring grid area are obtained.

N為大於等於1的整數。名稱相關相似度為及門店名稱相關的相似度，可基於第一門店名稱和第二門店名稱得到。名稱相關相似度可包括但不限於字元相似度、語義相似度、門店類型相似度中的任意一種或兩種以上。字元相似度為組成門店名稱的字元的相似度。語義相似度為門店名稱的語義的相似度。門店類型相似度為基於門店名稱得到的門店類型的相似度。 N is an integer greater than or equal to 1. Name-related similarity is the similarity related to the store name, which can be obtained based on the first store name and the second store name. Name-related similarity may include but is not limited to any one or more of character similarity, semantic similarity, and store type similarity. Character similarity is the similarity of the characters constituting the store name. Semantic similarity is the similarity of the semantics of the store name. Store type similarity is the similarity of the store type obtained based on the store name.

在一些示例中，名稱相關相似度包括字元相似度。可對第一門店名稱和第二門店名稱分別進行分詞，得到第一門店名稱對應的詞彙和第二門店名稱對應的詞彙；計算第一門店名稱對應的詞彙和第二門店名稱對應的詞彙的詞頻(Term Frequency,TF)和逆向文件頻率(Inverse Document Frequency,IDF)；選取詞頻低於等於冗餘詞頻閾值且逆向文件頻率大於冗餘頻率指數閾值的詞彙；基於選取的第一門店名稱對應的詞彙和選取的第二門店名稱對應的詞彙，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的字元相似度。 In some examples, the name-related similarity includes character similarity. The first store name and the second store name may be segmented respectively to obtain the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name; the term frequency (TF) and inverse document frequency (IDF) of the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name are calculated; the vocabulary whose term frequency is less than or equal to the redundant term frequency threshold and whose inverse document frequency is greater than the redundant frequency index threshold is selected; based on the selected vocabulary corresponding to the first store name and the selected vocabulary corresponding to the second store name, the character similarity of the target store and the stock stores located in the target grid area and the neighboring grid area is obtained.

可利用分詞工具對第一門店名稱進行切分，得到第一門店名稱對應的詞彙；利用分詞工具對第二門店名稱進行切分，得到第二門店名稱對應的詞彙。詞頻表徵詞彙出現的頻率。逆向文件頻率用於表徵詞彙具有的區分能力。冗餘詞頻閾值為用於區分詞彙是否為冗餘詞彙的詞頻的閾值。冗餘頻率指數閾值為用於區分詞彙是否為冗餘詞彙的逆向文件頻率的閾值。若某詞彙的詞頻大於冗餘詞頻閾值，表示該詞彙為冗餘詞彙；若某詞彙的逆向文件頻率小於等於冗餘頻率指數閾值，表示該詞彙為冗餘詞彙。冗餘詞彙對字元相似度的運算沒有幫助，甚至可能會有不良影響，不需參與字元相似度的運算。詞頻低於等於冗餘詞頻閾值且逆向文件頻率大於冗餘頻率指數閾值的詞彙為參與字元相似度運算的有效詞彙。字元相似度運算可參考機器翻譯所使用的雙語評估學習(Bilingual Evaluation Understudy,BLEU)演算法，通過選取的第一門店名稱對應的詞彙和第二門店名稱對應的詞彙間的N-gram重合度來評價第一門店名稱和第二名稱在字元方面的相似性。 The word segmentation tool can be used to segment the first store name to obtain the vocabulary corresponding to the first store name; the word segmentation tool can be used to segment the second store name to obtain the vocabulary corresponding to the second store name. The word frequency represents the frequency of the occurrence of a word. The reverse document frequency is used to represent the distinguishing ability of a word. The redundant word frequency threshold is the threshold of the word frequency used to distinguish whether a word is a redundant word. The redundant frequency index threshold is the threshold of the reverse document frequency used to distinguish whether a word is a redundant word. If the frequency of a word is greater than the redundant word frequency threshold, it means that the word is a redundant word; if the reverse document frequency of a word is less than or equal to the redundant frequency index threshold, it means that the word is a redundant word. Redundant words are not helpful for the calculation of character similarity, and may even have adverse effects, so they do not need to participate in the calculation of character similarity. Words whose frequency is less than or equal to the redundant word frequency threshold and whose reverse document frequency is greater than the redundant frequency index threshold are valid words participating in the character similarity calculation. The character similarity calculation can refer to the Bilingual Evaluation Understudy (BLEU) algorithm used in machine translation. The character similarity between the first store name and the second store name is evaluated by the N-gram overlap between the words corresponding to the first store name and the words corresponding to the second store name.

在一些示例中，名稱相關相似度包括語義相似度。將第一門店名稱和第二門店名稱分別轉化為第一名稱數位序列和第二名稱數位序列；將第一名稱數位序列和第二名稱數位序列輸入第一模型，得到第一模型輸出的目標門店與位於目標網格區域和鄰居網格區域的存量門店的語義相似度。 In some examples, the name-related similarity includes semantic similarity. The first store name and the second store name are converted into a first name digital sequence and a second name digital sequence respectively; the first name digital sequence and the second name digital sequence are input into the first model to obtain the semantic similarity between the target store output by the first model and the stock stores located in the target grid area and the neighboring grid area.

第一模型用於根據輸入的兩個門店名稱轉化為的數位序列輸出兩個門店名稱的語義相似度。可預先獲取一定數量的具有標注的門店名稱作為訓練集正樣本，隨機抽取數量相當的門店名稱作為訓練集負樣本，將訓練集正樣本和訓練集負樣本分別轉換為數位序列，利用數位序列訓練得到第一模型。第一模型可包括分類模型，可為深度學習分類模型或其他類型的分類模型，在此並不限定。例如，可利用雙向變形編碼器語言表示模型(Bidirectional Encoder Representations from Transformer,BERT)模型，將“[CLS]+某一門店名稱對應的數位序列+[SEP]+另一門店名稱對應的數位序列”作為輸入，訓練第一模型，使第一模型可擬合一門店名稱與另一門店名稱的語義相似度，即，使第一模型可根據輸入輸出一門店名稱與另一門店名稱的語義相似度。 The first model is used to output the semantic similarity of the two store names according to the digital sequences converted from the two store names. A certain number of annotated store names can be obtained in advance as positive samples of the training set, and a corresponding number of store names can be randomly selected as negative samples of the training set. The positive samples of the training set and the negative samples of the training set are converted into digital sequences respectively, and the first model is obtained by training with the digital sequences. The first model may include a classification model, which may be a deep learning classification model or other types of classification models, which are not limited here. For example, the Bidirectional Encoder Representations from Transformer (BERT) model can be used to take "[CLS] + a digital sequence corresponding to a store name + [SEP] + a digital sequence corresponding to another store name" as input to train the first model so that the first model can fit the semantic similarity between one store name and another store name, that is, the first model can output the semantic similarity between one store name and another store name based on the input.

第一名稱數位序列為第一門店名稱轉化為的數位序列。第二名稱數位序列為第二門店名稱轉化為的數位序列。具體可將門店名稱按字分割，將分割得到的字轉化為數字，將每個字對應的數位組合，得到數位序列。將第一名稱數位序列和位於目標網格區域和鄰居網格區域的一個存量門店對應的第二名稱數位序列輸入第一模型，第一模型可輸出目標門店的門店名稱與這一個存量門店的門店名稱的語義相似度。 The first name digital sequence is the digital sequence converted from the first store name. The second name digital sequence is the digital sequence converted from the second store name. Specifically, the store name can be segmented by characters, the segmented characters can be converted into numbers, and the numbers corresponding to each character can be combined to obtain a digital sequence. The first name digital sequence and the second name digital sequence corresponding to an existing store located in the target grid area and the neighboring grid area are input into the first model, and the first model can output the semantic similarity between the store name of the target store and the store name of this existing store.

在一些示例中，名稱相關相似度包括門店類型相似度。在門店去重處理過程中可能會出現門店為連鎖店且距離較近、不同門店名稱類似所產生的誤去重的可能，為了降低甚至避免誤去重的可能，可引入門店類型相似度來提高門店去重的準確性。可根據第一門店名稱，得到第一門店名稱資訊；將第一門店名稱資訊輸入第二模型，得到第二模型輸出的目標門店的門店類型概率向量；在存量門店資料庫中查找與第二門店名稱對應的門店類型概率向量；計算目標門店的門店類型概率向量與第二門店名稱對應的門店類型概率向量的相似度，將相似度確定為目標門店與位於目標網格區域和鄰居網格區域的存量門店的門店類型相似度。 In some examples, the name-related similarity includes store type similarity. In the store deduplication process, there may be a possibility of false deduplication due to the close proximity of chain stores and the similar names of different stores. In order to reduce or even avoid the possibility of false deduplication, store type similarity can be introduced to improve the accuracy of store deduplication. According to the first store name, the first store name information can be obtained; the first store name information is input into the second model to obtain the store type probability vector of the target store output by the second model; the store type probability vector corresponding to the second store name is searched in the stock store database; the similarity between the store type probability vector of the target store and the store type probability vector corresponding to the second store name is calculated, and the similarity is determined as the store type similarity between the target store and the stock stores located in the target grid area and the neighboring grid area.

第二模型用於根據輸入的門店名稱資訊輸出門店類型概率向量。門店類型概率向量用於表徵門店名稱指示的門店屬於各門店類型的概率。門店類型概率向量中的每個元素可表徵門店屬於一門店類型的概率，可將門店類型概率向量中表徵的概率最大元素對應的門店類型確定為該門店的門店類型。門店類型概率向量可為長度為M的歸一化向量，但並不限於此。可預先獲取一定數量的具有標注的門店名稱和門店類型作為訓練集，如<XXXX1(B1地區店)，超市>、<YYYY2(B2地區店)，咖啡廳>，其中，XXXX1(B1地區店)和YYYY2(B2地區店)為門店名稱，超市和咖啡廳為門店類型。利用訓練集訓練得到第二模型。第二模型可包括分類模型，可為深度學習分類模型或其他類型的分類模型，在此並不限定。例如，可利用BERT模型，將“[CLS]+某一門店名稱對應的數位序列”作為輸入，訓練第二模型，使第二模型可擬合該門店名稱及門店類型之間的對應關係，即，使第二模型可根據輸入輸出該門店名稱的門店類型概率向量。 The second model is used to output a store type probability vector based on the input store name information. The store type probability vector is used to represent the probability that the store indicated by the store name belongs to each store type. Each element in the store type probability vector can represent the probability that the store belongs to a store type, and the store type corresponding to the element with the maximum probability represented in the store type probability vector can be determined as the store type of the store. The store type probability vector can be a normalized vector with a length of M, but is not limited to this. A certain number of annotated store names and store types can be obtained in advance as a training set, such as <XXXX1 (B1 regional store), supermarket>, <YYYY2 (B2 regional store), cafe>, where XXXX1 (B1 regional store) and YYYY2 (B2 regional store) are store names, and supermarket and cafe are store types. The second model is trained using the training set. The second model may include a classification model, which may be a deep learning classification model or other types of classification models, which are not limited here. For example, the BERT model can be used to take "[CLS] + a digital sequence corresponding to a store name" as input to train the second model so that the second model can fit the correspondence between the store name and the store type, that is, the second model can output the store type probability vector of the store name based on the input.

第一門店名稱資訊基於第一門店名稱得到，可為第一門店名稱，也可為第一門店名稱經處理後的資訊，如數位序列，門店名稱轉化為數位序列的方式可參見上述實施例中的相關說明，在此不再贅述。第二門店名稱對應的門店類型概率向量包括位於目標網格區域和鄰居網格區域的存量門店對應的門店類型概率向量。在一些示例中，目標門店的門店類型概率向量與第二門店名稱對應的門店類型概率向量的相似度可為兩門店類型概率向量的餘弦相似度。 The first store name information is obtained based on the first store name, which may be the first store name or information of the first store name after processing, such as a digital sequence. The method of converting the store name into a digital sequence can be referred to the relevant description in the above embodiment, which will not be repeated here. The store type probability vector corresponding to the second store name includes the store type probability vector corresponding to the stock stores located in the target grid area and the neighboring grid area. In some examples, the similarity between the store type probability vector of the target store and the store type probability vector corresponding to the second store name may be the cosine similarity of the two store type probability vectors.

為了進一步縮短門店去重處理所需的時間，可預先根據各存量門店的門店名稱，得到存量門店的門店類型概率向量，以便於需要計算門店類型相似度時，直接從存量門店資料庫中獲取。具體地，可獲取存量門店的門店名稱，根據門店名稱，得到門店名稱資訊；將存量門店的門店名稱資訊輸入第二模型，得到第二模型輸出的存量門店的門店類型概率向量；建立存量門店和存量門店的門店類型概率向量的第二對應關係，並將第二對應關係存儲於存量門店資料庫。在計算門店類型相似度時，可根據第二對應關係，在存量門店資料庫中查找得到第二門店名稱對應的門店類型概率向量。 In order to further shorten the time required for store deduplication processing, the store type probability vector of the existing stores can be obtained in advance based on the store names of each existing store, so that when the store type similarity needs to be calculated, it can be directly obtained from the existing store database. Specifically, the store name of the existing store can be obtained, and the store name information can be obtained based on the store name; the store name information of the existing store is input into the second model to obtain the store type probability vector of the existing store output by the second model; a second correspondence relationship between the existing store and the store type probability vector of the existing store is established, and the second correspondence relationship is stored in the existing store database. When calculating the store type similarity, the store type probability vector corresponding to the second store name can be found in the existing store database based on the second correspondence relationship.

在步驟S1042中，基於第一門店位置資訊和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的位置相似度。 In step S1042, based on the first store location information and the second store location information, the location similarity between the target store and the existing stores located in the target grid area and the neighboring grid area is obtained.

位置相似度為及門店位置資訊相關的相似度，可基於第一門店位置資訊和第二門店位置資訊得到。位置相似度可根據兩個門店位置資訊指示的兩個門店位置之間的距離和位置資訊可能導致的偏差量確定。具體地，可根據第一門店位置資訊和第二門店位置資訊，得到目標門店與存量門店的地理距離；根據地理距離和位置偏差閾值的比值，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的位置相似度。第一門店位置資訊和第二門店位置資訊可為定位座標資訊，如全球定位系統(Global Positioning System,GPS)座標資訊。若第一門店位置資訊和第二門店位置資訊為位址資訊，則可將位址資訊轉換為座標資訊，如經緯度資訊，再根據座標資訊確定目標門店與存量門店的地理距離。位置偏差閾值可為位置資訊可能導致的偏差量的最大值。可利用地理距離和位置偏差閾值的比值進行歸一化，從而得到位置相似度。例如，位置相似度可根據下式(1)得到：

The location similarity is the similarity related to the store location information, which can be obtained based on the first store location information and the second store location information. The location similarity can be determined based on the distance between the two store locations indicated by the two store location information and the deviation amount that may be caused by the location information. Specifically, the geographical distance between the target store and the existing store can be obtained based on the first store location information and the second store location information; and the location similarity between the target store and the existing stores located in the target grid area and the neighboring grid area can be obtained based on the ratio of the geographical distance and the location deviation threshold. The first store location information and the second store location information can be positioning coordinate information, such as Global Positioning System (GPS) coordinate information. If the first store location information and the second store location information are address information, the address information can be converted into coordinate information, such as latitude and longitude information, and then the geographical distance between the target store and the existing store can be determined based on the coordinate information. The location deviation threshold can be the maximum value of the deviation that may be caused by the location information. The ratio of the geographical distance and the location deviation threshold can be used for normalization to obtain the location similarity. For example, the location similarity can be obtained according to the following formula (1):

在步驟S1043中，根據N個名稱相關相似度、位置相似度以及對應的權重係數，計算得到目標相似度。 In step S1043, the target similarity is calculated based on the N name-related similarities, position similarities and corresponding weight coefficients.

權重係數可作為指數或乘積係數參與目標相似度的計算，在此並不限定。在一些示例中，權重係數可作為指數參與目標相似度的計算，例如，名稱相關相似度包括字元相似度、語義相似度和門店類型相似度，則目標相似度可根據下式(2)得到：sim(目標門店,存量門店)=sim(字符)^α×sim(語義)^β×sim(類型)^γ×sim(位置)^δ(2) The weight coefficient can be used as an index or a product coefficient to participate in the calculation of target similarity, which is not limited here. In some examples, the weight coefficient can be used as an index to participate in the calculation of target similarity. For example, the name-related similarity includes character similarity, semantic similarity, and store type similarity. The target similarity can be obtained according to the following formula (2): sim (target store , existing store) = sim (character) ^α × sim (semantic) ^β × sim (type) ^γ × sim (position) ^δ (2)

其中，sim(目標門店,存量門店)為目標相似度；sim(字符)為字元相似度；sim(語義)為語義相似度；sim(類型)為門店類型相似度；sim(位置)為位置相似度；α為字元相似度的權重係數；β為語義相似度的權重係數；γ為門店類型相似度的權重係數；δ為位置相似度的權重係數。在一些示例中，為了方便計算，可使α=β=γ=δ=1。 Among them, sim (target store , existing store) is the target similarity; sim (character) is the character similarity; sim (semantic) is the semantic similarity; sim (type) is the store type similarity; sim (location) is the location similarity; α is the weight coefficient of character similarity; β is the weight coefficient of semantic similarity; γ is the weight coefficient of store type similarity; δ is the weight coefficient of location similarity. In some examples, for the convenience of calculation, α=β=γ= δ =1 can be used.

為了便於理解，下面以一示例對門店去重處理方法進行說明。在該示例中，名稱相關相似度包括字元相似度、語義相似度和門店類型相似度。 For easier understanding, the following example is used to illustrate the store deduplication processing method. In this example, the name-related similarity includes character similarity, semantic similarity, and store type similarity.

獲取目標門店的門店名稱和門店位址，將門店位址轉換為經緯度座標，轉換得到的經緯度座標為{30.193，120.173}。利用網格編碼演算法，計算得到目標門店所在網格區域即目標網格區域的網格編碼為wtm7y8e。鄰居網格區域的網格編碼的前6位元字元與目標網格區域的網格編碼的前6位元字元相同，可利用如圖4所示的編碼表得到8個鄰居網格區域的網格編碼。8個鄰居網格區域的網格編碼分別為wtm7y82、wtm7y83、wtm7y84、wtm7y8W、wtm7y8R、wtm7y8S、wtm7y8D和wtm7y8F。在存量門店資料庫中查詢，確定目標網格區域中具有158個存量門店，網格編碼為wtm7y82的鄰居網格區域中具有0個存量門店，網格編碼為wtm7y83的鄰居網格區域中具有4個存量門店，網格編碼為wtm7y84的鄰居網格區域中具有1個存量門店，網格編碼為wtm7y8W的鄰居網格區域中具有0個存量門店，網格編碼為wtm7y8R的鄰居網格區域中具有18個存量門店，網格編碼為wtm7y8S的鄰居網格區域中具有1個存量門店，網格編碼為wtm7y8D的鄰居網格區域中具有0個存量門店，網格編碼為wtm7y8F的鄰居網格區域中具有0個存量門店。即，目標網格區域和鄰居網格區域中共具有181個存量門店。需計算得到目標門店與目標網格區域和鄰居網格區域中每一個存量門店的目標相似度。 Get the store name and store address of the target store, convert the store address into longitude and latitude coordinates, and the converted longitude and latitude coordinates are {30.193, 120.173}. Using the grid coding algorithm, calculate the grid code of the target grid area where the target store is located, that is, wtm7y8e. The first 6 bits of the grid code of the neighboring grid area are the same as the first 6 bits of the grid code of the target grid area. The grid codes of the 8 neighboring grid areas can be obtained using the coding table shown in Figure 4. The grid codes of the 8 neighboring grid areas are wtm7y82, wtm7y83, wtm7y84, wtm7y8W, wtm7y8R, wtm7y8S, wtm7y8D and wtm7y8F. By searching the existing store database, it is determined that there are 158 existing stores in the target grid area, 0 existing stores in the neighboring grid area with grid code wtm7y82, 4 existing stores in the neighboring grid area with grid code wtm7y83, 1 existing store in the neighboring grid area with grid code wtm7y84, and 1 existing store in the neighboring grid area with grid code wtm7y8W. There are 0 existing stores in the neighboring grid area, 18 existing stores in the neighboring grid area with grid code wtm7y8R, 1 existing store in the neighboring grid area with grid code wtm7y8S, 0 existing stores in the neighboring grid area with grid code wtm7y8D, and 0 existing stores in the neighboring grid area with grid code wtm7y8F. That is, there are 181 existing stores in the target grid area and the neighboring grid area. The target similarity between the target store and each existing store in the target grid area and the neighboring grid area needs to be calculated.

下面以目標門店與其中一個存量門店的目標相似度的計算為例進行說明。目標門店的門店名稱為“X1X2(杭州市濱江寶龍城市廣場店)”，存量門店名稱為“杭州市濱江區X3X4便利店”，其中，X1、X2、X3和X4均為漢字，且是不同的漢字。 The following example uses the calculation of the target similarity between the target store and one of the existing stores as an example. The name of the target store is "X1X2 (Hangzhou Binjiang Baolong City Plaza Store)", and the name of the existing store is "Hangzhou Binjiang District X3X4 Convenience Store", where X1, X2, X3 and X4 are all Chinese characters, and they are different Chinese characters.

可使用分詞工具對目標門店和存量門店的門店名稱進行切分，得到目標門店對應的詞彙和存量門店對應的詞彙。目標門店對應的詞彙包括`X1X2`、`(`、`杭州市`、`濱江`、`寶龍`、`城市`、`廣場`、`店`和`)`。存量門店對應的詞彙包括`杭州市`、`濱江區`、`X3X4`和`便利店`。計算各詞彙的詞頻和逆向文件頻率，上述詞彙中`(`、`杭州市`和`)`的詞頻和逆向文件頻率不符合詞頻低於等於冗餘詞頻閾值且逆向文件頻率大於冗餘頻率指數閾值的條件，因此捨棄詞彙`(`、`杭州市`和`)`。捨棄詞彙`(`、`杭州市`和`)`後，目標門店對應的選取的詞彙組合後為“X1X2濱江寶龍城市廣場店”，存量門店對應的選取的詞彙組合後為“濱江區X3X4便利店”。利用上述BLEU演算法計算字元相似度，“X1X2濱江寶龍城市廣場店”包含11個1-gram，“濱江區X3X4便利店”包含8個1-gram，分別計算兩者的1-gram的共現次數，可知`濱`、`江`和`店`三個1-gram分別共現一次，因此，“X1X2濱江寶龍城市廣場店”和“濱江區X3X4便利店”的字元相似度為(3/11+3/8)/2

0.32。 You can use the word segmentation tool to segment the store names of target stores and existing stores to obtain the words corresponding to the target stores and the words corresponding to the existing stores. The words corresponding to the target stores include `X1X2`, `(`, `Hangzhou City`, `Binjiang`, `Baolong`, `City`, `Plaza`, `Store`, and `)`. The words corresponding to the existing stores include `Hangzhou City`, `Binjiang District`, `X3X4`, and `Convenience Store`. The word frequency and reverse document frequency of each word are calculated. The word frequency and reverse document frequency of the above words `(`, `Hangzhou City` and `)` do not meet the conditions that the word frequency is less than or equal to the redundant word frequency threshold and the reverse document frequency is greater than the redundant frequency index threshold. Therefore, the words `(`, `Hangzhou City` and `)` are discarded. After discarding the words `(`, `Hangzhou City` and `)`, the selected word combination corresponding to the target store is "X1X2 Binjiang Baolong City Plaza Store", and the selected word combination corresponding to the existing store is "Binjiang District X3X4 Convenience Store". Using the above BLEU algorithm to calculate character similarity, "X1X2 Binjiang Baolong City Plaza Store" contains 11 1-grams, and "Binjiang District X3X4 Convenience Store" contains 8 1-grams. The co-occurrence counts of the 1-grams of the two are calculated separately. It can be seen that the three 1-grams of "Bin", "Jiang" and "Store" each co-occur once. Therefore, the character similarity of "X1X2 Binjiang Baolong City Plaza Store" and "Binjiang District X3X4 Convenience Store" is (3/11+3/8)/2

0.32.

可將“X1X2(杭州市濱江寶龍城市廣場店)”轉化為數位序列[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]，將“杭州市濱江區X3X4便利店”轉化為數位序列[3,4,5,6,7,16,17,18,19,20,14]，相同的漢字對應的數位相同。將上述兩個數位序列和[CLS]以及[SEP]拼接，組合為單個向量，並輸入第一模型，得到第一模型輸出的兩者的語義相似度。 "X1X2 (Hangzhou Binjiang Baolong City Plaza Store)" can be converted into a digital sequence [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], and "Hangzhou Binjiang District X3X4 Convenience Store" can be converted into a digital sequence [3,4,5,6,7,16,17,18,19,20,14]. The same Chinese characters correspond to the same digits. The above two digital sequences are concatenated with [CLS] and [SEP] to form a single vector, and then input into the first model to obtain the semantic similarity of the two output by the first model.

可將“X1X2(杭州市濱江寶龍城市廣場店)”和“杭州市濱江區X3X4便利店”轉換得到的兩個數位序列分別輸入第二模型，得到目標門店的門店類型概率向量和存量門店的門店類型概率向量。目標門店和存量門店在“購物”、“超市”、“便利店”三個門店類型維度上的元素的值比較高，基於目標門店的門店類型概率向量和存量門店的門店類型概率向量得到的門店類型相似度表徵的門店類型比較接近。 The two digital sequences converted from "X1X2 (Baolong City Plaza Store, Binjiang, Hangzhou)" and "X3X4 Convenience Store, Binjiang District, Hangzhou" can be input into the second model to obtain the store type probability vector of the target store and the store type probability vector of the existing store. The values of the elements of the target store and the existing store in the three store type dimensions of "shopping", "supermarket" and "convenience store" are relatively high, and the store types represented by the store type similarity based on the store type probability vector of the target store and the store type probability vector of the existing store are relatively close.

基於目標門店的門店位置資訊和存量門店的門店位置資訊，確定兩者的地理距離為285米，根據該地理距離和位置偏差閾值可計算得到位置相似度為0.8585。 Based on the store location information of the target store and the store location information of the existing store, the geographical distance between the two is determined to be 285 meters. Based on the geographical distance and the location deviation threshold, the location similarity can be calculated to be 0.8585.

設去重相似度閾值為0.6，對於目標門店和存量門店，利用上述式(2)計算得到的目標相似度小於0.6，可確定目標門店和存量門店不是同一門店。 Assuming the deduplication similarity threshold is 0.6, for the target store and the existing store, the target similarity calculated using the above formula (2) is less than 0.6, which means that the target store and the existing store are not the same store.

需要說明的是，本發明實施例中對資訊、資料的獲取、存儲、使用、處理等均得到用戶或相關機構的授權，符合國家法律法規的相關規定。 It should be noted that the acquisition, storage, use, and processing of information and data in the embodiments of the present invention are authorized by users or relevant institutions and comply with relevant provisions of national laws and regulations.

本發明第二方面提供一種門店去重處理裝置。圖6為本發明一實施例提供的門店去重處理裝置的結構示意圖。如圖6所示，該門店去重處理裝置300可包括第一獲取模組301、網格區域確定模組302、第二獲取模組303、計算模組304和去重模組305。 The second aspect of the present invention provides a store deduplication processing device. FIG6 is a schematic diagram of the structure of the store deduplication processing device provided by an embodiment of the present invention. As shown in FIG6, the store deduplication processing device 300 may include a first acquisition module 301, a grid area determination module 302, a second acquisition module 303, a calculation module 304 and a deduplication module 305.

第一獲取模組301可用於獲取目標門店的第一門店名稱和第一門店位置資訊。 The first acquisition module 301 can be used to obtain the first store name and first store location information of the target store.

網格區域確定模組302可用於根據第一門店位置資訊，確定目標門店所在的目標網格區域。 The grid area determination module 302 can be used to determine the target grid area where the target store is located based on the first store location information.

第二獲取模組303可用於在預存的存量門店資料庫中，獲取位於目標網格區域和鄰居網格區域的存量門店的第二門店名稱和第二門店位置資訊。 The second acquisition module 303 can be used to obtain the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area from the pre-stored stock store database.

鄰居網格區域與目標網格區域相鄰。 The neighbor grid area is adjacent to the target grid area.

計算模組304可用於基於第一門店名稱、第一門店位置資訊、第二門店名稱和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的目標相似度。 The calculation module 304 can be used to obtain the target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information.

去重模組305可用於在目標相似度大於等於預設的去重相似度閾值的情況下，將目標門店作為重複門店去除。 The deduplication module 305 can be used to remove the target store as a duplicate store when the target similarity is greater than or equal to the preset deduplication similarity threshold.

在本發明實施例中，可根據目標門店的門店位置資訊，確定目標門店所在的網格區域。網格區域為地圖中劃分的區域。基於資料庫中位於目標門店所在的目標網格區域的存量門店、目標網格區域周邊的網格區域的存量門店以及目標門店的門店名稱、門店位置資訊，得到目標門店與存量門店的相似度，根據該相似度判斷新獲取的門店是否與存量門店為同一門店，若新獲取的門店與存量門店為同一門店，則認為新獲取的門店為重複門店，予以去除。該去重過程不需人工參與，且利用門店的位置可縮小用於比對的存量門店的範圍，提高了門店去重處理的效率。 In the embodiment of the present invention, the grid area where the target store is located can be determined based on the store location information of the target store. The grid area is the area divided in the map. Based on the existing stores in the target grid area where the target store is located, the existing stores in the grid area around the target grid area, and the store name and store location information of the target store in the database, the similarity between the target store and the existing stores is obtained, and the newly acquired store is judged based on the similarity whether it is the same store as the existing store. If the newly acquired store is the same store as the existing store, the newly acquired store is considered to be a duplicate store and is removed. The deduplication process does not require manual participation, and the location of the store can be used to reduce the range of the existing stores used for comparison, thereby improving the efficiency of store deduplication processing.

在一些實施例中，網格區域具有網格編碼。門店去重處理裝置200還可包括鄰居網格區域確定模組。 In some embodiments, the grid area has a grid code. The store deduplication processing device 200 may also include a neighbor grid area determination module.

在一些示例中，鄰居網格區域確定模組可用於：獲取目標網格區域的網格編碼；根據目標網格區域的網格編碼和網格編碼逆演算法，獲取目標網格區域的頂點的位置資訊；根據目標網格區域的頂點的位置資訊，確定位於鄰居網格區域中輔助點的位置資訊；基於每個鄰居網格區域中輔助點的位置資訊和網格編碼演算法，計算得到每個鄰居網格區域的網格編碼，以確定鄰居網格區域。 In some examples, the neighbor grid area determination module can be used to: obtain the grid code of the target grid area; obtain the location information of the vertices of the target grid area according to the grid code of the target grid area and the grid code inverse algorithm; determine the location information of the auxiliary points located in the neighbor grid area according to the location information of the vertices of the target grid area; calculate the grid code of each neighbor grid area based on the location information of the auxiliary points in each neighbor grid area and the grid coding algorithm to determine the neighbor grid area.

在一些示例中，相鄰的網格區域的網格編碼中一部分數位的值相同。鄰居網格區域確定模組可用於：獲取目標網格區域的網格編碼；根據目標網格區域的網格編碼，獲取候選網格區域的網格編碼，候選網格區域包括網格編碼中一部分數位的字元與目標網格區域的網格編碼中一部分數位的字元相同的網格區域；按照網格編碼演算法中的網格區域排布與編碼數位的字元的對應關係，在候選網格區域的網格編碼中確定鄰居網格區域的網格編碼，以確定鄰居網格區域。 In some examples, the values of a portion of digits in the grid codes of adjacent grid regions are the same. The neighboring grid region determination module can be used to: obtain the grid code of the target grid region; obtain the grid code of the candidate grid region according to the grid code of the target grid region, the candidate grid region includes a grid region in which the characters of a portion of digits in the grid code are the same as the characters of a portion of digits in the grid code of the target grid region; determine the grid code of the neighboring grid region in the grid code of the candidate grid region according to the correspondence between the grid region arrangement in the grid coding algorithm and the characters of the coded digits, so as to determine the neighboring grid region.

在一些實施例中，門店去重處理裝置300還可包括第一預處理模組。第一預處理模組可用於：將地圖劃分為多個網格區域，並利用網格編碼演算法，為每個網格區域分配網格編碼；獲取存量門店的門店位置資訊，根據存量門店的門店位置資訊，確定存量門店所在的網格區域；建立存量門店和存量門店所在的網格區域的網格編碼的第一對應關係，並將第一對應關係存儲於存量門店資料庫。 In some embodiments, the store deduplication processing device 300 may also include a first pre-processing module. The first pre-processing module may be used to: divide the map into multiple grid areas, and use a grid coding algorithm to assign a grid code to each grid area; obtain the store location information of the existing stores, and determine the grid area where the existing stores are located according to the store location information of the existing stores; establish a first correspondence between the grid codes of the existing stores and the grid areas where the existing stores are located, and store the first correspondence in the existing store database.

在一些實施例中，計算模組304可用於：基於第一門店名稱和第二門店名稱，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的N個名稱相關相似度，N為大於等於1的整數；基於第一門店位置資訊和第二門店位置資訊，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的位置相似度；根據N個名稱相關相似度、位置相似度以及對應的權重係數，計算得到目標相似度。 In some embodiments, the calculation module 304 may be used to: obtain the similarities between the target store and the N names of the existing stores located in the target grid area and the neighboring grid area based on the first store name and the second store name, where N is an integer greater than or equal to 1; obtain the location similarities between the target store and the existing stores located in the target grid area and the neighboring grid area based on the first store location information and the second store location information; calculate the target similarity based on the N name-related similarities, the location similarity, and the corresponding weight coefficients.

在一些示例中，名稱相關相似度包括字元相似度。計算模組304可用於：對第一門店名稱和第二門店名稱分別進行分詞，得到第一門店名稱對應的詞彙和第二門店名稱對應的詞彙；計算第一門店名稱對應的詞彙和第二門店名稱對應的詞彙的詞頻和逆向文件頻率；選取詞頻低於等於冗餘詞頻閾值且逆向文件頻率大於冗餘頻率指數閾值的詞彙；基於選取的第一門店名稱對應的詞彙和選取的第二門店名稱對應的詞彙，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的字元相似度。 In some examples, the name-related similarity includes character similarity. The calculation module 304 can be used to: perform word segmentation on the first store name and the second store name respectively to obtain the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name; calculate the word frequency and reverse document frequency of the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name; select the vocabulary whose word frequency is less than or equal to the redundant word frequency threshold and whose reverse document frequency is greater than the redundant frequency index threshold; based on the selected vocabulary corresponding to the first store name and the selected vocabulary corresponding to the second store name, obtain the character similarity of the target store and the stock stores located in the target grid area and the neighboring grid area.

在一些示例中，名稱相關相似度包括語義相似度。計算模組304可用於：將第一門店名稱和第二門店名稱分別轉化為第一名稱數位序列和第二名稱數位序列；將第一名稱數位序列和第二名稱數位序列輸入第一模型，得到第一模型輸出的目標門店與位於目標網格區域和鄰居網格區域的存量門店的語義相似度，第一模型用於根據輸入的兩個門店名稱轉化為的數位序列輸出兩個門店名稱的語義相似度。 In some examples, the name-related similarity includes semantic similarity. The calculation module 304 can be used to: convert the first store name and the second store name into a first name digital sequence and a second name digital sequence respectively; input the first name digital sequence and the second name digital sequence into the first model to obtain the semantic similarity between the target store output by the first model and the stock stores located in the target grid area and the neighboring grid area, and the first model is used to output the semantic similarity of the two store names according to the digital sequences converted from the two input store names.

在一些示例中，名稱相關相似度包括門店類型相似度。計算模組304可用於：根據第一門店名稱，得到第一門店名稱資訊；將第一門店名稱資訊輸入第二模型，得到第二模型輸出的目標門店的門店類型概率向量，第二模型用於根據輸入的門店名稱資訊輸出門店類型概率向量，門店類型概率向量用於表徵門店名稱指示的門店屬於各門店類型的概率；在存量門店資料庫中查找與第二門店名稱對應的門店類型概率向量；計算目標門店的門店類型概率向量與第二門店名稱對應的門店類型概率向量的相似度，將相似度確定為目標門店與位於目標網格區域和鄰居網格區域的存量門店的門店類型相似度。 In some examples, the name-related similarity includes store type similarity. The calculation module 304 can be used to: obtain the first store name information according to the first store name; input the first store name information into the second model to obtain the store type probability vector of the target store output by the second model, the second model is used to output the store type probability vector according to the input store name information, and the store type probability vector is used to characterize the probability that the store indicated by the store name belongs to each store type; find the store type probability vector corresponding to the second store name in the stock store database; calculate the similarity between the store type probability vector of the target store and the store type probability vector corresponding to the second store name, and determine the similarity as the store type similarity of the target store and the stock stores located in the target grid area and the neighboring grid area.

在一些示例中，計算模組304可用於：根據第一門店位置資訊和第二門店位置資訊，得到目標門店與存量門店的地理距離；根據地理距離和位置偏差閾值的比值，得到目標門店與位於目標網格區域和鄰居網格區域的存量門店的位置相似度。 In some examples, the calculation module 304 can be used to: obtain the geographical distance between the target store and the existing store based on the first store location information and the second store location information; obtain the location similarity between the target store and the existing store located in the target grid area and the neighboring grid area based on the ratio of the geographical distance and the location deviation threshold.

在一些實施例中，門店去重處理裝置還可包括第二預處理模組。第二預處理模組可用於：獲取存量門店的門店名稱，根據門店名稱，得到門店名稱資訊；將存量門店的門店名稱資訊輸入第二模型，得到第二模型輸出的存量門店的門店類型概率向量；建立存量門店和存量門店的門店類型概率向量的第二對應關係，並將第二對應關係存儲於存量門店資料庫。 In some embodiments, the store deduplication processing device may further include a second pre-processing module. The second pre-processing module may be used to: obtain the store name of the existing store, and obtain the store name information according to the store name; input the store name information of the existing store into the second model to obtain the store type probability vector of the existing store output by the second model; establish a second correspondence between the existing store and the store type probability vector of the existing store, and store the second correspondence in the existing store database.

本發明第三方面提供一種門店去重處理設備。圖7為本發明一實施例提供的門店去重處理設備的結構示意圖。如圖7所示，門店去重處理設備400包括記憶體401、處理器402及存儲在記憶體401上並可在處理器402上運行的電腦程式。 The third aspect of the present invention provides a store deduplication processing device. FIG7 is a schematic diagram of the structure of the store deduplication processing device provided by an embodiment of the present invention. As shown in FIG7, the store deduplication processing device 400 includes a memory 401, a processor 402, and a computer program stored in the memory 401 and executable on the processor 402.

在一些示例中，上述處理器402可以包括中央處理器(Central Processing Unit,CPU)，或者特殊應用積體電路(Application Specific Integrated Circuit,ASIC)，或者可以被配置成實施本發明實施例的一個或多個積體電路。 In some examples, the processor 402 may include a central processing unit (CPU), or an application specific integrated circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present invention.

記憶體401可包括唯讀記憶體(Read-Only Memory,ROM)，隨機存取記憶體(Random Access Memory,RAM)，磁片存儲介質設備，光存儲介質設備，快閃記憶體設備，電氣、光學或其他物理/有形的記憶體存放裝置。因此，通常，記憶體包括一個或多個編碼有包括電腦可執行指令的軟體的有形(非暫態)電腦可讀存儲介質(例如，記憶體設備)，並且當該軟體被執行(例如，由一個或多個處理器)時，其可操作來執行參考根據本發明實施例中門店去重處理方法所描述的操作。 Memory 401 may include read-only memory (ROM), random access memory (RAM), magnetic disk storage medium device, optical storage medium device, flash memory device, electrical, optical or other physical/tangible memory storage device. Therefore, generally, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software including computer-executable instructions, and when the software is executed (e.g., by one or more processors), it can be operated to perform the operations described with reference to the store deduplication processing method according to the embodiment of the present invention.

處理器402通過讀取記憶體401中存儲的可執行程式碼來運行與可執行程式碼對應的電腦程式，以用於實現上述實施例中的門店去重處理方法。 The processor 402 runs the computer program corresponding to the executable program code by reading the executable program code stored in the memory 401, so as to implement the store deduplication processing method in the above embodiment.

在一些示例中，門店去重處理設備400還可包括通信介面403和匯流排404。其中，如圖7所示，記憶體401、處理器402、通信介面403通過匯流排404連接並完成相互間的通信。 In some examples, the store deduplication processing device 400 may also include a communication interface 403 and a bus 404. As shown in FIG7 , the memory 401, the processor 402, and the communication interface 403 are connected through the bus 404 and communicate with each other.

通信介面403，主要用於實現本發明實施例中各模組、裝置、單元和/或設備之間的通信。也可通過通信介面403接入輸入裝置和/或輸出設備。 The communication interface 403 is mainly used to realize the communication between the modules, devices, units and/or equipment in the embodiment of the present invention. The input device and/or output device can also be connected through the communication interface 403.

匯流排404包括硬體、軟體或兩者，將門店去重處理設備400的部件彼此耦接在一起。舉例來說而非限制，匯流排404可包括加速圖形埠(Accelerated Graphics Port,AGP)或其他圖形匯流排、增強工業標準架構(Enhanced Industry Standard Architecture,EISA)匯流排、前側匯流排(Front Side Bus,FSB)、超傳送標準(Hyper Transport,HT)互連、工業標準架構(Industry Standard Architecture,ISA)匯流排、無限頻寬互連、低接腳計數(Low pin count,LPC)匯流排、記憶體匯流排、微通道架構(Micro Channel Architecture,MCA)匯流排、周邊組件互連(Peripheral Component Interconnect,PCI)匯流排、快速周邊組件互連(Peripheral Component Interconnect Express,PCI-E)匯流排、串列進階技術附接(Serial Advanced Technology Attachment,SATA)匯流排、視訊電子標準協會區域(Video Electronics Standards Association Local Bus,VLB)匯流排或其他合適的匯流排或者兩個或更多個以上這些的組合。在合適的情況下，匯流排404可包括一個或多個匯流排。儘管本發明實施例描述和示出了特定的匯流排，但本發明考慮任何合適的匯流排或互連。 The bus 404 includes hardware, software, or both, coupling the components of the store deduplication processing device 400 to each other. By way of example and not limitation, bus 404 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hyper Transport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an InfiniBand interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a Peripheral Component Interconnect Express (PCI) bus, or a 10-bit x86 bus. Express (PCI-E) bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association Local Bus (VLB) bus or other suitable bus or a combination of two or more of the above. Where appropriate, bus 404 may include one or more buses. Although the embodiments of the present invention describe and illustrate specific buses, the present invention contemplates any suitable bus or interconnect.

本發明第四方面提供一種電腦可讀存儲介質，該電腦可讀存儲介質上存儲有電腦程式指令，該電腦程式指令被處理器執行時可實現上述實施例中的門店去重處理方法，且能達到相同的技術效果，為避免重複，這裡不再贅述。其中，上述電腦可讀存儲介質可包括非暫態電腦可讀存儲介質，如唯讀記憶體(Read-Only Memory,ROM)、隨機存取記憶體 (Random Access Memory,RAM)、磁碟或者光碟等，在此並不限定。 The fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program instruction is stored. When the computer program instruction is executed by a processor, the store deduplication processing method in the above embodiment can be implemented, and the same technical effect can be achieved. To avoid repetition, it is not repeated here. Among them, the above-mentioned computer-readable storage medium may include a non-transient computer-readable storage medium, such as a read-only memory (ROM), a random access memory (RAM), a disk or an optical disk, etc., which is not limited here.

本發明實施例提供一種電腦程式產品，該電腦程式產品中的指令由電子設備的處理器執行時，使得電子設備可執行上述實施例中的門店去重處理方法，且能達到相同的技術效果，為避免重複，這裡不再贅述。 The embodiment of the present invention provides a computer program product. When the instructions in the computer program product are executed by the processor of the electronic device, the electronic device can execute the store deduplication processing method in the above embodiment and achieve the same technical effect. To avoid repetition, it will not be repeated here.

需要明確的是，本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同或相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。對於裝置實施例、設備實施例、電腦可讀存儲介質實施例、電腦程式產品實施例而言，相關之處可以參見方法實施例的說明部分。本發明並不局限於上文所描述並在圖中示出的特定步驟和結構。本領域的技術人員可以在領會本發明的精神之後，作出各種改變、修改和添加，或者改變步驟之間的順序。並且，為了簡明起見，這裡省略對已知方法技術的詳細描述。 It should be made clear that each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. For the device embodiment, equipment embodiment, computer-readable storage medium embodiment, and computer program product embodiment, the relevant parts can refer to the description part of the method embodiment. The present invention is not limited to the specific steps and structures described above and shown in the figure. After understanding the spirit of the present invention, technicians in this field can make various changes, modifications and additions, or change the order between the steps. In addition, for the sake of brevity, the detailed description of the known method technology is omitted here.

上面參考根據本發明的實施例的方法、裝置(系統)和電腦程式產品的流程圖和/或框圖描述了本發明的各方面。應當理解，流程圖和/或框圖中的每個方框以及流程圖和/或框圖中各方框的組合可以由電腦程式指令實現。這些電腦程式指令可被提供給通用電腦、專用電腦、或其它可程式設計資料處理裝置的處理器，以產生一種機器，使得經由電腦或其它可程式設計資料處理裝置的處理器執行的這些指令使能對流程圖和/或框圖的一個或多個方框中指定的功能/動作的實現。這種處理器可以是但不限於是通用處理器、專用處理器、特殊應用處理器或者現場可程式設計邏輯電路。還可理解，框圖和/或流程圖中的每個方框以及框圖和/或流程圖中的方框的組合，也可以由執行指定的功能或動作的專用硬體來實現，或可由專用硬體和電腦指令的組合來實現。 Aspects of the present invention are described above with reference to flowcharts and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present invention. It will be understood that each block in the flowchart and/or block diagram, and combinations of blocks in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device to produce a machine such that these instructions executed by the processor of the computer or other programmable data processing device cause the computer or other programmable data processing device to perform the operation of the computer program. It can realize the functions/actions specified in one or more blocks of the flowchart and/or block diagram. Such a processor may be, but is not limited to, a general purpose processor, a dedicated processor, an application specific processor, or a field programmable logic circuit. It can also be understood that each box in the block diagram and/or flowchart and the combination of boxes in the block diagram and/or flowchart can also be implemented by dedicated hardware that performs the specified function or action, or can be implemented by This is accomplished through a combination of specialized hardware and computer instructions.

本領域技術人員應能理解，上述實施例均是示例性而非限制性的。在不同實施例中出現的不同技術特徵可以進行組合，以取得有益效果。本領域技術人員在研究圖式、說明書及申請專利範圍的基礎上，應能理解並實現所揭示的實施例的其他變化的實施例。在申請專利範圍中，術語“包括”並不排除其他裝置或步驟；數量詞“一個”不排除多個；術語“第一”、“第二”用於標示名稱而非用於表示任何特定的順序。請求項中的任何圖式標記均不應被理解為對保護範圍的限制。請求項中出現的多個部分的功能可以由一個單獨的硬體或軟體模組來實現。某些技術特徵出現在不同的從屬請求項中並不意味著不能將這些技術特徵進行組合以取得有益效果。 Those skilled in the art should understand that the above embodiments are exemplary rather than restrictive. Different technical features appearing in different embodiments can be combined to achieve beneficial effects. Based on studying the drawings, instructions and the scope of the patent application, those skilled in the art should be able to understand and implement other variations of the disclosed embodiments. In the scope of the patent application, the term "including" does not exclude other devices or steps; the quantifier "one" does not exclude multiple; the terms "first" and "second" are used to identify names rather than to indicate any specific order. Any diagrammatic mark in the claim should not be understood as limiting the scope of protection. The functions of multiple parts appearing in the claim can be implemented by a single hardware or software module. The fact that certain technical features appear in different subordinate claim items does not mean that these technical features cannot be combined to achieve beneficial results.

S101,S102,S103,S104,S105:步驟 S101, S102, S103, S104, S105: Steps

Claims

A store deduplication processing method is characterized in that it includes: a store deduplication device obtains a first store name and first store location information of a target store; the store deduplication device determines a target grid area where the target store is located according to the first store location information; the store deduplication device obtains a second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area from a pre-stored stock store database. information, the neighboring grid area is adjacent to the target grid area; the store deduplication device obtains the target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information; when the target similarity is greater than or equal to a preset deduplication similarity threshold, the store deduplication device The target store is removed as a duplicate store; wherein the grid area has a grid code, and before the store deduplication device obtains the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area in the pre-stored stock store database, it also includes: the store deduplication device obtains the grid code of the target grid area; the store deduplication device removes the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area according to the grid code of the target grid area The grid coding inverse algorithm is used to obtain the location information of the vertex of the target grid area; the store deduplication device determines the location information of the auxiliary point in the neighboring grid area according to the location information of the vertex of the target grid area; the store deduplication device calculates the grid code of each neighboring grid area based on the location information of the auxiliary point in each neighboring grid area and the grid coding algorithm to determine the neighboring grid area.

As described in claim 1, the grid area has a grid code, and the grid codes of adjacent grid areas have the same value for a portion of digits. Before the store deduplication device obtains the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area in the pre-stored stock store database, the method further includes: the store deduplication device obtains the grid code of the target grid area; the store deduplication device selects the second store name and second store location information of the stock stores located in the target grid area and the neighboring grid area according to the grid code of the target grid area; The grid code of the target grid area is obtained by obtaining the grid code of the candidate grid area, wherein the candidate grid area includes a grid area in which a part of the digital characters in the grid code is the same as a part of the digital characters in the grid code of the target grid area; the store deduplication device determines the grid code of the neighboring grid area in the grid code of the candidate grid area according to the correspondence between the grid area arrangement and the coded digital characters in the grid coding algorithm, so as to determine the neighboring grid area.

The method as claimed in claim 1, further comprising: the store deduplication device divides the map into a plurality of grid areas, and uses a grid coding algorithm to assign a grid code to each grid area; the store deduplication device obtains the store location information of the existing store, and determines the grid area where the existing store is located according to the store location information of the existing store; the store deduplication device establishes a first correspondence between the existing store and the grid code of the grid area where the existing store is located, and stores the first correspondence in the existing store database.

The method of claim 1, wherein the store deduplication device obtains the target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information, including: the store deduplication device obtains the target similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name and the second store name. The store deduplication device obtains the location similarity between the target store and the existing stores in the target grid area and the neighboring grid area based on the first store location information and the second store location information; the store deduplication device calculates the target similarity based on the N name-related similarities, the location similarities and the corresponding weight coefficients.

A method as described in claim 4, wherein the name-related similarity includes character similarity, and the store deduplication device obtains N name-related similarities between the target store and the existing stores located in the target grid area and the neighboring grid area based on the first store name and the second store name, including: the store deduplication device performs word segmentation on the first store name and the second store name respectively to obtain the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name; the store deduplication device performs word segmentation on the first store name and the second store name respectively to obtain the vocabulary corresponding to the first store name and the vocabulary corresponding to the second store name; The re-deduplication device calculates the word frequency and reverse document frequency of the words corresponding to the first store name and the words corresponding to the second store name; the store de-duplication device selects words whose word frequency is less than or equal to the redundant word frequency threshold and whose reverse document frequency is greater than the redundant frequency index threshold; the store de-duplication device obtains the character similarity between the target store and the stock stores located in the target grid area and the neighboring grid area based on the words corresponding to the selected first store name and the words corresponding to the selected second store name.

The method of claim 4, wherein the name-related similarity includes semantic similarity, and the store deduplication device obtains N name-related similarities between the target store and the stock stores located in the target grid area and the neighboring grid area based on the first store name and the second store name, including: the store deduplication device converts the first store name and the second store name into a first name digital sequence and a second name digital sequence respectively; the store deduplication device inputs the first name digital sequence and the second name digital sequence into a first model to obtain the semantic similarity between the target store and the stock stores located in the target grid area and the neighboring grid area output by the first model, and the first model is used to output the semantic similarity of the two store names according to the digital sequence converted from the two input store names.

A method as described in claim 4, wherein the name-related similarity includes store type similarity, and the store deduplication device obtains N name-related similarities between the target store and the existing stores located in the target grid area and the neighboring grid area based on the first store name and the second store name, including: the store deduplication device obtains first store name information based on the first store name; the store deduplication device inputs the first store name information into a second model to obtain a store type probability vector of the target store output by the second model, and the second model is used to Output a store type probability vector according to the input store name information, the store type probability vector is used to characterize the probability that the store indicated by the store name belongs to each store type; the store deduplication device searches the store type probability vector corresponding to the second store name in the stock store database; the store deduplication device calculates the similarity between the store type probability vector of the target store and the store type probability vector corresponding to the second store name, and determines the similarity as the store type similarity between the target store and the stock stores located in the target grid area and the neighboring grid area.

The method as claimed in claim 7, further comprising: the store deduplication device obtains the store name of the existing store, and obtains the store name information according to the store name; the store deduplication device inputs the store name information of the existing store into the second model, and obtains the store type probability vector of the existing store output by the second model; the store deduplication device establishes a second correspondence between the existing store and the store type probability vector of the existing store, and stores the second correspondence in the existing store database.

The method of claim 4, wherein the store deduplication device obtains the location similarity between the target store and the existing stores located in the target grid area and the neighboring grid area based on the first store location information and the second store location information, including: the store deduplication device obtains the geographical distance between the target store and the existing stores based on the first store location information and the second store location information; the store deduplication device obtains the location similarity between the target store and the existing stores located in the target grid area and the neighboring grid area based on the ratio of the geographical distance and the location deviation threshold.

A store deduplication processing device is characterized in that it includes: a first acquisition module, used to obtain a first store name and first store location information of a target store; a grid area determination module, used to determine a target grid area where the target store is located according to the first store location information; a second acquisition module, used to obtain a second store name and second store location information of stock stores located in the target grid area and a neighboring grid area from a pre-stored stock store database, wherein the neighboring grid area is adjacent to the target grid area; a calculation module, used to obtain a correlation between the target store and the target grid area and the neighboring grid area based on the first store name, the first store location information, the second store name and the second store location information. a target similarity of the stock stores in the target area; a deduplication module, used for removing the target store as a duplicate store when the target similarity is greater than or equal to a preset deduplication similarity threshold; wherein the grid area has a grid code, and the neighboring grid area determination module is used for obtaining the grid code of the target grid area; according to the grid code of the target grid area and the grid code inverse The algorithm is used to obtain the position information of the vertex of the target grid area; the position information of the auxiliary point located in the neighboring grid area is determined according to the position information of the vertex of the target grid area; the grid code of each neighboring grid area is calculated based on the position information of the auxiliary point in each neighboring grid area and the grid coding algorithm to determine the neighboring grid area.

A store deduplication processing device, characterized in that the device includes: a processor and a memory storing computer program instructions; when the processor executes the computer program instructions, the store deduplication processing method as described in any one of claims 1 to 9 is implemented.

A computer-readable storage medium, characterized in that a computer program instruction is stored on the computer-readable storage medium, and when the computer program instruction is executed by a processor, a store deduplication processing method as described in any one of claim items 1 to 9 is implemented.