CN115512146A

CN115512146A - POI information mining method, device, equipment and storage medium

Info

Publication number: CN115512146A
Application number: CN202211356607.4A
Authority: CN
Inventors: 马小明
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2022-12-23

Abstract

The disclosure provides a POI information mining method, device, equipment and storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of image processing, text processing, deep learning and the like, and can be applied to scenes such as POI information retrieval service, shop signboard verification and the like. The specific implementation scheme comprises the following steps: the method comprises the steps of obtaining POI information, wherein the POI information comprises signboard images of target stores and texts related to the target stores; determining a first score for a first preset number of first brands having a highest similarity between the signboard images and the signboard images of the target store; determining a second score of a second preset number of second brands with highest similarity between the text and the text related to the target store; and determining a target brand corresponding to the POI information from the first brand and the second brand according to the first score and the second score. The method and the system can intelligently mine the POI information of the brand, realize low cost and high timeliness, and can greatly improve the mining recall rate and accuracy rate of the POI information.

Description

POI information mining method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of image processing, text processing, deep learning and the like, can be applied to scenes of providing POI information retrieval service for users, verifying truth of shop signboards and the like, and particularly relates to a POI information mining method, device, equipment and storage medium.

Background

Point of interest (POI) information (or referred to as POI data) generally refers to some information related to a certain geographic location point in the geographic information. For example, the geographic location point may be a house, a store or store, a mailbox, a bus station, and the like. The POI information may include fields for the name, address, coordinates, phone, hours of operation, brand, etc. of the geographic location point.

In POI information, brands are important content attributes of many stores. When a user searches a store of a certain target brand, search services (such as a map service, a takeout service and the like) need to search for a chain of users according to the need of searching, mine stores of which POI information contains the target brand or is related to the target brand, and show the information of the stores to the user for the user to select.

Currently, the main ways to mine POI information include two. In one approach, a provider of the search service may mine POI information for stores related to the target brand based on merchant cooperation or official web crawlers. Alternatively, the provider of the search service may cluster and mine POI information of stores related to the target brand through a face map clustering algorithm (e.g., a Kmeans algorithm).

Disclosure of Invention

The invention provides a POI information mining method, a POI information mining device, POI information mining equipment and a storage medium, which can intelligently mine POI information of a brand, realize low cost and high timeliness, and can greatly improve mining recall rate and accuracy rate of the POI information.

According to a first aspect of the present disclosure, there is provided a POI information mining method, the method including:

the method comprises the steps of obtaining POI information of a target store, wherein the POI information comprises signboard images of the target store and texts related to the target store; determining a first preset number of first brands with highest similarity between the signboard images and the signboard images of the target store according to a preset signboard image library, and determining a first score of each first brand according to the similarity between the signboard images of the first brands and the signboard images of the target store, wherein the signboard image library comprises at least two brand signboard images; determining a second preset number of second brands with highest similarity between the text and the text related to the target store according to a preset text library, and determining a second score of each second brand according to the similarity between the text related to the second brands and the text related to the target store, wherein the text library comprises at least two texts related to brands; determining a fused score for each of the first brand and the second brand based on the first score and the second score; and determining a target brand corresponding to the POI information of the target store from the first brand and the second brand according to the fusion score of each brand of the first brand and the second brand.

According to a second aspect of the present disclosure, there is provided a POI information mining apparatus, the apparatus including:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring POI information of a target store, and the POI information comprises a signboard image of the target store and a text related to the target store; the image processing unit is used for determining a first preset number of first brands with highest similarity between the signboard images and the target shop according to a preset signboard image library, and determining a first score of each first brand according to the similarity between the signboard images of the first brands and the signboard images of the target shop, wherein the signboard image library comprises at least two signboard images; the text processing unit is used for determining a second preset number of second brands with highest similarity between the text and the text related to the target store according to a preset text library, determining a second score of each second brand according to the similarity between the text related to the second brand and the text related to the target store, and the text library comprises at least two texts related to the brands; a fusion unit for determining a fusion score of each of the first brand and the second brand according to the first score and the second score; and the identification unit is used for determining a target brand corresponding to the POI information of the target store from the first brand and the second brand according to the fusion score of each brand of the first brand and the second brand.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flowchart of a POI information mining method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an implementation of S102 in fig. 1 according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of another implementation of S102 in fig. 1 according to an embodiment of the present disclosure;

fig. 4 is a schematic flow chart of an implementation of S103 in fig. 1 according to an embodiment of the present disclosure;

fig. 5 is a schematic flow chart of another implementation of S103 in fig. 1 according to an embodiment of the present disclosure;

fig. 6 is a schematic composition diagram of a POI information mining apparatus according to an embodiment of the present disclosure;

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be understood that in the embodiments of the present disclosure, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.

In POI information, brands are important content attributes of many stores. When a user searches a store of a certain target brand, search services (such as a map service, a takeout service and the like) need to search for a chain of search demands of the user, find stores of which the POI information comprises the target brand or is related to the target brand, and show information of the stores to the user for selection.

Currently, the main ways to mine POI information include two. In one mode, a provider of the search service may obtain POI information of stores related to the target brand in advance based on a mode of business cooperation or official crawlers, and when the search service is provided, the provider may show information of stores related to the target brand to the user based on the obtained POI information of stores related to the target brand. However, the mode of acquiring the information based on merchant cooperation or official website reptiles is strong in dependence on objective factors, lacks of initiative, needs to expand multi-element cooperation, and is high in cost and poor in effectiveness.

Alternatively, the provider of the search service may cluster and mine POI information of stores related to the target brand through a face map clustering algorithm (e.g., a Kmeans algorithm). However, the mining accuracy and recall rate of the brand POI information are low by the clustering mining mode of the door face image clustering algorithm.

For example, the signboard of a store under a certain brand of interest may include three types, i.e., a chinese character signboard, a pattern signboard, and an english character signboard, but when mining POI information of a store related to the brand of interest, only POI information of a store corresponding to one or two of the chinese character signboard, the pattern signboard, and the english character signboard may be mined, resulting in a low mining recall rate.

For example, when mining POI information of a store related to a certain target brand, there is a possibility that the POI information of stores corresponding to signs of similar brands is mistakenly mined, resulting in a low mining accuracy.

The invention provides a POI information mining method, which can intelligently mine POI information of a brand, is low in cost and high in timeliness, and can greatly improve mining recall rate and accuracy rate of the POI information.

The execution subject of the method may be a computer or server, or may also be other devices having data processing capabilities. The subject matter of the method is not limited in this respect.

Illustratively, the server may be a background server that provides POI information retrieval services (or referred to as retrieval services) for the user. For example, a search service for searching stores of a certain target brand can be provided for a user in the map application, and the server can be a background server of the map application. For example, the takeaway application may provide a search service for searching stores of a target brand to the user, and the server may be a backend server of the takeaway application.

In some embodiments, the server may be a single server, or may be a server cluster composed of a plurality of servers. In some embodiments, the server cluster may also be a distributed cluster. The present disclosure is also not limited to a specific implementation of the server.

The POI information mining method is exemplified below.

Fig. 1 is a schematic flowchart of a POI information mining method provided in the embodiment of the present disclosure. As shown in fig. 1, the method may include:

s101, POI information of the target store is obtained, wherein the POI information comprises a signboard image of the target store and a text related to the target store.

Illustratively, the target store may be a store, such as a restaurant, convenience store, or the like. The signboard image of the target store may be a picture or video taken of the target store. The target store-related text may include a user comment on the target store.

S102, according to a preset signboard image library, determining a first preset number of first brands with highest similarity between signboard images of the signboard image and target shops, and according to the similarity between the signboard images of the first brands and the signboard images of the target shops, determining a first score of each first brand, wherein the signboard image library comprises at least two brand signboard images.

For example, the preset signboard image library may include signboard images of at least two brands. In some implementation modes, the signboard image library can be constructed in a door map clustering and manual labeling mode. For example, a plurality of brand images (a brand image is an image obtained by photographing a store of a brand) may be acquired, and each brand image may indicate a brand to which the brand belongs. Clustering algorithms (e.g., kmeans algorithm, BDSCAN clustering algorithm, etc.) may be used to cluster the signboard images, with each cluster corresponding to a brand. These clustered clusters can be constructed to generate a signboard image library.

For the signboard image of the target store in S101, a similarity between the signboard image of the target store and each cluster in the signboard image library may be calculated as a similarity between the signboard image of the brand corresponding to each cluster and the signboard image of the target store. According to the similarity between the signboard image of the brand corresponding to each cluster and the signboard image of the target store, a first preset number of brands with the highest similarity can be selected as the first brand. That is, the number of the first brands is a first preset number.

Illustratively, the first preset number may be 10, 20, etc., and the size of the first preset number is not limited herein.

Alternatively, when the similarity between the signboard image of the target store and each cluster in the signboard image library is calculated, the image feature of the cluster center of the signboard images in each cluster (or the average value of the image features of all the signboard images) and the image feature of the signboard image of the target store may be calculated, then the image feature of the cluster center of the signboard images in each cluster and the image feature of the signboard images of the target store are subjected to image feature coding by a resnet + arcface method, and feature index calculation is performed according to the coded image features, so as to obtain the similarity between the signboard images of the target store and each cluster in the signboard image library.

For example, the step of determining a first score of each first brand according to the similarity between the signboard image of the first brand and the signboard image of the target store in S102 may include: the similarity between the signboard image of the first brand and the signboard image of the target store is taken as a first score of the first brand.

S103, according to a preset text library, determining a second preset number of second brands with highest similarity between the text and the text related to the target store, and according to the similarity between the text related to the second brands and the text related to the target store, determining a second score of each second brand, wherein the text library comprises at least two texts related to the brands.

For example, the preset text library may include at least two brands of related text. In some implementation manners, the text library may also be constructed by a clustering and manual labeling manner. For example, multiple branding-related texts may be obtained, and each (or each) set of texts may label which brand belongs to. Clustering algorithms can be used to cluster the texts, with each cluster corresponding to a brand. The clustered clusters can be used for constructing a text library.

For the text related to the target store in S101, a similarity between the text related to the target store and each cluster in the text library may be calculated as a similarity between the text of the brand corresponding to each cluster and the text related to the target store. According to the similarity between the text of the brand corresponding to each cluster and the text related to the target store, a second preset number of brands with the highest similarity can be selected as second brands. That is, the number of the second brands is a second preset number.

Illustratively, the second preset number may be 10, 20, etc. The second predetermined number may be the same as or different from the first predetermined number. The size of the second predetermined number is also not limited herein.

Optionally, when the similarity between the text related to the target store and each cluster in the text library is calculated, the text feature of the clustering center of the text in each cluster (or the average value of the text features of all the texts) and the text feature of the text related to the target store may be calculated, then the text feature of the clustering center of the text in each cluster and the text feature of the text related to the target store are subjected to text feature coding by a method of Bert + cycloss, and feature index calculation is performed according to the coded text features, so as to obtain the similarity between the text related to the target store and each cluster in the text library.

For example, the step of determining a second score of each second brand according to the similarity between the text related to the second brand and the text related to the target store in S103 may include: and taking the similarity between the text related to the second brand and the text related to the target store as a second score of the second brand.

And S104, determining a fusion score of each brand in the first brand and the second brand according to the first score and the second score.

For example, determining a fused score for each of the first brand and the second brand based on the first score and the second score may include: for each of the first brand and the second brand, summing the first score and the second score for each brand as a fused score for the brand.

For example, assuming that the first preset number and the second preset number are both 5, the first brand includes: brand 1, brand 2, brand 3, brand 4, brand 5, the first score corresponding to brand 1 is 0.8, the first score corresponding to brand 2 is 0.7, the first score corresponding to brand 3 is 0.7, the first score corresponding to brand 4 is 0.6, and the first score corresponding to brand 5 is 0.5; the second brand includes: brand 1, brand 3, brand 4, brand 6, brand 7, with the second score for brand 1 being 0.9, the second score for brand 3 being 0.8, the second score for brand 4 being 0.7, the second score for brand 6 being 0.6, and the second score for brand 7 being 0.6. Then, for brand 1, the first score corresponding to brand 1 is 0.8, the second score is 0.9, and the fusion score corresponding to brand 1 may be 0.8+0.9=1.7; for brand 2, the first score corresponding to brand 2 is 0.7, the second score is 0 (i.e., there is no second score), and the blend score corresponding to brand 2 may be 0.7+0=0.7; similarly, a fusion score for each of the other brands may be derived.

And S105, determining a target brand corresponding to the POI information of the target store from the first brand and the second brand according to the fusion score of each brand of the first brand and the second brand.

In some implementations, one or more brands with the highest fusion scores from the first brand and the second brand may be selected as the target brand corresponding to the POI information of the target store according to the fusion scores of each of the first brand and the second brand.

For example, taking the example given in S104 as an example, all the brands appearing in the first brand and the second brand include brand 1, brand 2, brand 3, brand 4, brand 5, brand 6, and brand 7, and the target brand corresponding to the POI information of the target store may be selected from brand 1, brand 2, brand 3, brand 4, brand 5, brand 6, and brand 7 with the highest fusion score of brand 1.

In some other implementation manners, a brand with a fusion score greater than a preset score may be selected from the first brand and the second brand as a target brand corresponding to the POI information of the target store. For example, the predetermined score may be 0.9, 0.8, etc., without limitation.

In the embodiment of the disclosure, by acquiring POI information of a target store, the POI information including a signboard image of the target store and a target store-related text, determining a first preset number of first brands with the highest similarity between the signboard image and the signboard image of the target store according to a preset signboard image library, determining a first score of each first brand according to the similarity between the signboard image of the first brand and the signboard image of the target store, determining a second preset number of second brands with the highest similarity between the text and the target store-related text according to the preset text library, determining a second score of each second brand according to the similarity between the second brand-related text and the target store-related text, determining a fusion score of each of the first brand and the second brand according to the first score and the second score, determining the corresponding POI information of the target store from the first brand and the second brand according to the fusion score of each of the first brand and the second brand, and mining the POI information of the target store with high cost and high efficiency.

The first score related to the image and the second score related to the text are fused, the target brand corresponding to the POI information of the target store is identified according to the fusion score, and the mining recall rate and the accuracy rate of the POI information can be greatly improved.

Illustratively, the POI information mining method provided by the embodiment of the disclosure can be applied to a scene of verifying the brand of a new store. For example, a store is newly opened, but there is no brand information in the map data for a while, the actual brand of the newly opened store can be determined by the POI information mining method provided by the embodiment of the present disclosure.

In some embodiments, the S104 may also include: and summing the first score and the second score corresponding to each brand in the first brand and the second brand by taking the weight occupied by the first score as the first weight and the weight occupied by the second score as the second weight to obtain the fusion score of each brand in the first brand and the second brand.

For example, assume that the first weight is α, the second weight is β, and the first score is score ₁ And the second score is score ₂ The fusion score can be obtained by the following formula.

score _Fusion ＝α*score ₁ +β*score ₂

Wherein, score _Fusion Represents the fusion score.

In this embodiment, the first weight and the second weight are respectively set for the first score and the second score, so that the influence of the signboard image and the text on the mining result in the POI information mining process can be flexibly adjusted.

Alternatively, the sum of the first weight and the second weight may be 1, or may not be 1, and is not limited herein.

In some embodiments, the text libraries include at least two, each text library corresponding to a type of text.

The step of determining a second preset number of second brands with highest similarity between the text and the text related to the target store according to the preset text library may include: and determining a second preset number of second brands with highest similarity between the text and the text related to the target store according to each text library respectively to obtain at least two types of second preset number of second brands.

The step of determining a second score for each second brand according to the similarity between the text related to the second brand and the text related to the target store may include: for each type of second brand, a second score for each second brand is determined based on a similarity between the text associated with the second brand and the text associated with the target store.

Illustratively, the types of text may include at least two of: comment type, recommendation type, network type. For example, comment-type text may refer to a user's evaluation of a target store in a platform (e.g., map, take-out, etc.). The text of the recommendation type may be dishes, goods, etc. recommended by the user or the merchant of the target store for the target store in a platform (e.g., map, take-out, etc.). The text of the network type can be the network IP address, the WIFI name or the WIFI password of the target store and other WIFI information. The present disclosure is not limited to the type of text.

In this embodiment, for each type of text, a text library corresponding to the type of text may be constructed. For each text library, a second preset number of second brands with highest similarity between the text and the text related to the target store can be determined, and a second preset number of second brands corresponding to the text type of the text library is obtained. Thus, a second preset number of second brands of at least two types may be available.

For example, the text library includes a comment text library, a recommendation text library, and a network text library, and a second preset number of second brands with highest similarity between the text and the text related to the target store may be determined according to each text library, so as to obtain a second preset number of second brands of a comment type, a second preset number of second brands of a recommendation type, and a second preset number of second brands of a network type. That is, each text type may correspond to a second predetermined number of second brands.

For each type of second brand, a second score for each second brand may be determined based on a similarity between the text associated with the second brand and the text associated with the target store.

In this embodiment, when determining the fusion score of each of the first brand and the second brand according to the first score and the second score, the first score and the second score of each brand may be summed up as the fusion score of the brand for each of the first brand and the second brand. The difference from the foregoing embodiment is that, in the present embodiment, the second score of each brand includes a plurality of second scores respectively corresponding to the types.

For example, assuming that the first preset number and the second preset number are both 2, and the first weight and the second weight are both 0.5, the first brand includes: brand 1, brand 2, the first score corresponding to brand 1 is 0.8, the first score corresponding to brand 2 is 0.7; the second brand comprises a second brand of the comment type and a second brand of the network type, wherein the second brand of the comment type comprises a brand 1 and a brand 3, the second score corresponding to the brand 1 of the comment type is 0.9, and the second score corresponding to the brand 3 of the comment type is 0.7; the second brand of network type includes brand 2, brand 3, the second score for brand 2 of network type is 0.8, and the second score for brand 3 of network type is 0.7. Then, for brand 1, the first score corresponding to brand 1 is 0.8, the second score corresponding to the comment type is 0.9, the second score corresponding to the network type is 0, and the fusion score corresponding to brand 1 may be 0.8+ 0.5+0.9 + 0.5+ 0.85. Similarly, a fusion score for each of the other brands may be obtained.

In this embodiment, the text base includes at least two text bases, each text base includes one type of text, a second preset number of second brands with the highest similarity between the text and the text related to the target store are determined according to each text base, a second preset number of second brands of at least two types are obtained, for each type of second brands, a second score of each second brand is determined according to the similarity between the text related to the second brands and the text related to the target store, POI information mining based on multiple text types can be achieved, and recall rate and accuracy can be further improved.

In some implementations, when summing the first score and the second score for each of the first brand and the second brand, the second scores for the second brands of different types are each weighted by a second weight. That is, the second scores for the different types of second brands are weighted equally.

In some other implementations, when summing up the first score and the second score corresponding to each of the first brand and the second brand, the sum of the weights of the second scores corresponding to all types of the second brands may be a second weight, and the weights of the second scores corresponding to different types of the second brands are different.

In the implementation mode, the weights occupied by the second scores corresponding to the second brands of different types are different, and the influence of texts of different types on the mining result in the POI information mining process can be flexibly adjusted.

Fig. 2 is a schematic flowchart of an implementation of S102 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 2, in some embodiments, the step of determining a first score for each first brand according to the similarity between the first brand 'S signboard image and the target store' S signboard image in S102 may include:

s201, determining similarity scores of the first brands according to the similarity between the signboard images of the first brands and the signboard images of the target stores.

For example, assuming that the first brand includes brand 1 and brand 2, the similarity between the signboard image of brand 1 and the signboard image of the target store is 0.8, and the similarity between the signboard image of brand 2 and the signboard image of the target store is 0.7, the similarity score of brand 1 may be 0.8, and the similarity score of brand 2 may be 0.7.

S202, determining the consistency score of each first brand according to the number of first brands belonging to the same brand in the first preset number of first brands and the first preset number.

For example, in some scenarios, the signs of a portion of stores may be different, but the portion of stores may belong to the same brand. For example, a certain brand of store sign may include: various designs, chinese signs, english signs, etc. In the first predetermined number of first brands determined in the embodiments of the present disclosure, a situation may occur where a part of the first brands belong to the same brand.

For example, assuming that the first preset number is 5, the first brand includes brand 1, brand 2, brand 3, brand 4, and brand 5, where both brand 1 and brand 3 are brand a, but brand 1 uses brand a chinese sign and brand 3 uses brand a english sign.

S202 may include: and for each first brand, taking the ratio of the number of first brands belonging to the same brand with the first brand in a first preset number of first brands to the first preset number as the consistency score of the first brand.

Taking the first brand cards including brand 1, brand 2, brand 3, brand 4 and brand 5, wherein both brand 1 and brand 3 are brand a, brand 1 and brand 3 belong to the same brand, and the number of the first brand cards belonging to the same brand as brand 1 is 2, the consistency score of brand 1 may be 0.4; the conformity score for brand 3 was also 0.4. If the number of first brands belonging to the same brand as brand 2 is 1, the identity score for brand 2 may be 0.2. The number of first brands belonging to the same brand as brand 4 is also 1, and the identity score for brand 4 is also 0.2.

S203, determining a first score of the first brand according to the similarity score and the consistency score of the first brand.

In some implementations, S203 may include: the similarity score and the consistency score of the first brand are summed to obtain a first score of the first brand.

In addition, in some implementations, S203 may include: and summing the similarity score and the consistency score of the first brand to obtain a first score of the first brand, wherein the weight occupied by the similarity score of the first brand is a third weight, and the weight occupied by the consistency score of the first brand is a fourth weight. The sum of the third weight and the fourth weight may be 1 or may not be 1.

In this embodiment, by introducing the consistency score, the score proportion occupied by the brand with better consistency can be increased, and the mining recall rate and accuracy of the POI information are further increased.

Fig. 3 is a schematic flow chart of another implementation of S102 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 3, in some embodiments, the step of determining a first preset number of first brands with highest similarity between the signboard images and the signboard images of the target store according to the preset signboard image library in S102 may include:

s301, inputting the signboard image of the target store into a preset image recognition model, and determining candidate first brands with the signboard image similar to the signboard image of the target store and the confidence coefficient of each candidate first brand through the image recognition model, wherein the image recognition model is obtained by adopting signboard image library training.

Illustratively, the neural network may be trained using a signboard image library to obtain an image recognition model. The type of neural network is not limited herein. The image recognition model may have a function of predicting to which signboard the signboard image of the target store belongs. After entering the signboard image of the target store into the image recognition model, the image recognition model may output candidate first brands whose signboard images are similar to the signboard image of the target store, and a confidence of each candidate first brand.

S302, taking the confidence coefficient of the candidate first brands as similarity, and selecting a first preset number of first brands with highest similarity from the candidate first brands.

For example, assuming that the first preset number is 2, the candidate first brand includes brand 1, brand 2, brand 3, brand 4, brand 5, the confidence of brand 1 is 0.9, the confidence of brand 2 is 0.8, the confidence of brand 3 is 0.7, the confidence of brand 4 is 0.6, and the confidence of brand 5 is 0.5, then brand 1 and brand 2 with the highest confidence may be selected as the first brand.

In this embodiment, by introducing the image recognition model and selecting the first brands of the first preset number in combination with the confidence level output by the image recognition model, the accuracy of the similarity calculation result can be improved, and further the mining recall rate of the POI information is improved.

Fig. 4 is a schematic flow chart of an implementation of S103 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 4, in some embodiments, the step of determining a second score for each second brand according to the similarity between the text related to the second brand and the text related to the target store in S103 may include:

s401, determining the similarity score of each second brand according to the similarity between the texts related to the second brands and the texts related to the target store.

S401 may refer to S201, and is not described in detail.

S402, determining the consistency score of each second brand according to the number of second brands belonging to the same brand in a second preset number of second brands and the second preset number.

S402 may refer to S202, and is not described in detail.

And S403, determining a second score of the second brand according to the similarity score and the consistency score of the second brand.

In some implementations, S403 may include: and summing the similarity score and the consistency score of the second brand to obtain a second score of the second brand.

In addition, in some implementations, S403 may include: and summing the similarity score and the consistency score of the second brand to obtain a second score of the second brand, wherein the weight occupied by the similarity score of the second brand is a fifth weight, and the weight occupied by the consistency score of the second brand is a sixth weight. The sum of the fifth weight and the sixth weight may be 1 or may not be 1.

In this embodiment, by introducing the consistency score, the score proportion occupied by the brand with better consistency can be improved, and the mining recall rate and accuracy of the POI information are further improved.

Fig. 5 is a schematic diagram of another implementation flow of S103 in fig. 1 according to an embodiment of the present disclosure. As shown in fig. 3, in some embodiments, the step of determining, according to the preset text library, a second preset number of second brands with the highest similarity between the text and the text related to the target store in S103 may include:

s501, inputting a text related to the target store into a preset text recognition model, and determining candidate second brands with texts similar to the text related to the target store and the confidence coefficient of each candidate second brand through the text recognition model, wherein the text recognition model is obtained by adopting text library training.

Illustratively, a text library may be used to train the neural network, resulting in a text recognition model. The type of neural network is not limited herein. The text recognition model may have the function of predicting to which sign the text associated with the target store belongs. After entering the text associated with the target store into the text recognition model, the text recognition model may output candidate second brands whose text is similar to the text associated with the target store and a confidence level for each of the candidate second brands.

Optionally, in the embodiment of the present disclosure, for each type of text mentioned in the foregoing embodiment (i.e., each text library), a text recognition model may be trained correspondingly for recognizing different types of text.

S502, taking the confidence degrees of the candidate second brands as the similarity, and selecting a second preset number of second brands with the highest similarity from the candidate second brands.

S502 may refer to S302, which is not described in detail,

in this embodiment, a text recognition model is introduced, and a second preset number of second brands is selected in combination with the confidence level output by the text recognition model, so that the accuracy of the similarity calculation result can be improved, and the mining recall rate of the POI information is further improved.

In an exemplary embodiment, the embodiment of the present disclosure further provides a POI information mining apparatus, which may be used to implement the POI information mining method according to the foregoing embodiment. Fig. 6 is a schematic composition diagram of a POI information mining apparatus according to an embodiment of the present disclosure. As shown in fig. 6, the apparatus may include: an acquisition unit 601, an image processing unit 602, a text processing unit 603, a fusion unit 604, and a recognition unit 605.

An acquiring unit 601, configured to acquire POI information of a target store, where the POI information includes a signboard image of the target store and text related to the target store.

An image processing unit 602, configured to determine, according to a preset signboard image library, a first preset number of first brands with a highest similarity between the signboard images and the signboard images of the target store, and determine a first score of each first brand according to the similarity between the signboard images of the first brands and the signboard images of the target store, where the signboard image library includes at least two brand signboard images.

The text processing unit 603 is configured to determine, according to a preset text library, a second preset number of second brands with the highest similarity between the text and the text related to the target store, and determine a second score of each second brand according to the similarity between the text related to the second brand and the text related to the target store, where the text library includes at least two brands of related text.

A fusing unit 604 for determining a fused score for each of the first brand and the second brand based on the first score and the second score.

The identifying unit 605 is configured to determine a target brand corresponding to the POI information of the target store from the first brand and the second brand according to the fusion score of each of the first brand and the second brand.

Optionally, the fusing unit 604 is specifically configured to sum the first score and the second score corresponding to each of the first brand and the second brand with the weight occupied by the first score as the first weight and the weight occupied by the second score as the second weight, so as to obtain a fused score of each of the first brand and the second brand.

Optionally, the text library includes at least two, and each text library correspondingly includes one type of text.

The text processing unit 603 is specifically configured to determine, according to each text library, a second preset number of second brands with the highest similarity between the text and the text related to the target store, and obtain at least two types of second preset number of second brands.

The text processing unit 603 is further specifically configured to determine, for each type of second brand, a second score of each second brand according to a similarity between the text related to the second brand and the text related to the target store.

Optionally, when the first score and the second score are summed for each of the first brand and the second brand, the second scores for the different types of second brands may be weighted differently.

Optionally, the types include at least two of: comment type, recommendation type, network type.

Optionally, the image processing unit 602 is specifically configured to determine a similarity score of each first brand according to a similarity between the signboard image of the first brand and the signboard image of the target store; determining the consistency score of each first brand according to the number of first brands belonging to the same brand in a first preset number of first brands and the first preset number; a first score for the first brand is determined based on the similarity score and the correspondence score for the first brand.

Optionally, the image processing unit 602 is specifically configured to input a signboard image of the target store into a preset image recognition model, and determine candidate first brands of which the signboard images are similar to the signboard image of the target store and a confidence level of each candidate first brand through the image recognition model, where the image recognition model is obtained by training using a signboard image library; and taking the confidence degree of the candidate first brands as the similarity, and selecting a first preset number of first brands with highest similarity from the candidate first brands.

Optionally, the text processing unit 603 is specifically configured to determine a similarity score of each second brand according to a similarity between the text related to the second brand and the text related to the target store; determining the consistency score of each second brand according to the number of second brands belonging to the same brand in a second preset number of second brands and a second preset number; a second score for the second brand is determined based on the similarity score and the correspondence score for the second brand.

Optionally, the text processing unit 603 is specifically configured to input a text related to the target store into a preset text recognition model, and determine candidate second brands of the text similar to the text related to the target store and a confidence level of each candidate second brand through the text recognition model, where the text recognition model is obtained by training a text library; and taking the confidence degrees of the candidate second brands as the similarity, and selecting a second preset number of second brands with the highest similarity from the candidate second brands.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to the above embodiments. The electronic device may be the computer or the server described above.

In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the above embodiments.

In an exemplary embodiment, the computer program product comprises a computer program which, when being executed by a processor, carries out the method according to the above embodiments.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 comprises a computing unit 701, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 701 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the POI information mining method. For example, in some embodiments, the POI information mining method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more of the steps of the POI information mining methods described above. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the POI information mining method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A POI information mining method, the method comprising:

the method comprises the steps of obtaining POI information of a target store, wherein the POI information comprises signboard images of the target store and texts related to the target store;

determining a first preset number of first brands with highest similarity between the signboard images and the signboard images of the target store according to a preset signboard image library, and determining a first score of each first brand according to the similarity between the signboard images of the first brands and the signboard images of the target store, wherein the signboard image library comprises at least two brand signboard images;

determining a second preset number of second brands with highest similarity between texts and texts related to the target store according to a preset text library, and determining a second score of each second brand according to the similarity between the texts related to the second brands and the texts related to the target store, wherein the text library comprises at least two texts related to the brands;

determining a fused score for each of the first brand and the second brand based on the first score and the second score;

and determining a target brand corresponding to the POI information of the target store from the first brand and the second brand according to the fusion score of each brand of the first brand and the second brand.

2. The method of claim 1, the determining a fused score for each of the first brand and the second brand from the first score and the second score, comprising:

and summing the first score and the second score corresponding to each brand in the first brand and the second brand by taking the weight occupied by the first score as a first weight and the weight occupied by the second score as a second weight to obtain a fused score of each brand in the first brand and the second brand.

3. The method of claim 2, wherein the text libraries comprise at least two, each of the text libraries correspondingly comprises a type of text;

determining a second brand with a second preset number and a highest similarity between the text and the text related to the target store according to a preset text library, including:

determining a second preset number of second brands with highest similarity between the texts and the texts related to the target store according to each text library respectively to obtain at least two types of second brands with a second preset number;

determining a second score for each of the second brands based on a similarity between the second brand-related text and the target store-related text, comprising:

for each type of the second brand, determining a second score for each of the second brands based on a similarity between the second brand-related text and the target store-related text.

4. The method of claim 3, wherein the first score and the second score for each of the first brand and the second brand are summed, and wherein the second scores for different types of the second brand are weighted differently.

5. The method of claim 3 or 4, the types comprising at least two of: comment type, recommendation type, network type.

6. The method of any of claims 1-5, the determining a first score for each of the first brands based on a similarity between the first brand's signboard images and the target store's signboard images, comprising:

determining a similarity score for each of the first brands based on a similarity between the first brand's signboard images and the target store's signboard images;

determining the consistency score of each first brand according to the number of first brands belonging to the same brand in the first preset number of first brands and the first preset number;

determining a first score for the first brand based on the similarity score and the consistency score for the first brand.

7. The method of claim 6, wherein determining a first preset number of first tiles with highest similarity between a signboard image and a signboard image of the target store according to a preset signboard image library comprises:

inputting the signboard image of the target store into a preset image recognition model, and determining candidate first brands with signboard images similar to the signboard image of the target store and the confidence coefficient of each candidate first brand through the image recognition model, wherein the image recognition model is obtained by training through the signboard image library;

and taking the confidence degree of the candidate first brands as similarity, and selecting a first preset number of first brands with highest similarity from the candidate first brands.

8. The method of any of claims 1-7, wherein determining a second score for each of the second brands based on a similarity between the second brand-related text and the target store-related text, comprises:

determining a similarity score for each of the second brands based on a similarity between the text relating to the second brand and the text relating to the target store;

determining the consistency score of each second brand according to the number of second brands belonging to the same brand in the second preset number of second brands and the second preset number;

determining a second score for the second brand based on the similarity score and the consistency score for the second brand.

9. The method of claim 8, wherein determining a second predetermined number of second brands with highest similarity between text and text associated with the target store according to a predetermined library of texts, comprises:

inputting the text related to the target store into a preset text recognition model, and determining candidate second brands with texts similar to the text related to the target store and the confidence coefficient of each candidate second brand through the text recognition model, wherein the text recognition model is obtained by adopting the text library for training;

and taking the confidence degrees of the candidate second brands as similarity, and selecting a second preset number of second brands with highest similarity from the candidate second brands.

10. A POI information mining apparatus, the apparatus comprising:

the system comprises an acquisition unit, a display unit and a display unit, wherein the acquisition unit is used for acquiring POI information of a target store, and the POI information comprises a signboard image of the target store and a text related to the target store;

an image processing unit, configured to determine, according to a preset signboard image library, a first preset number of first brands with highest similarity between signboard images of the target store and the signboard images of the target store, and determine a first score of each of the first brands according to the similarity between the signboard images of the first brands and the signboard images of the target store, where the signboard image library includes at least two brand signboard images;

the text processing unit is used for determining a second preset number of second brands with highest similarity between texts related to the target store according to a preset text library, and determining a second score of each second brand according to the similarity between the texts related to the second brands and the texts related to the target store, wherein the text library comprises at least two texts related to brands;

a fusing unit for determining a fused score for each of the first brand and the second brand according to the first score and the second score;

an identifying unit, configured to determine, according to a fusion score of each of the first brand and the second brand, a target brand corresponding to the POI information of the target store from the first brand and the second brand.

11. The apparatus of claim 10, wherein the fusion unit is specifically configured to sum the first score and the second score of each of the first brand and the second brand with a first weight occupied by the first score and a second weight occupied by the second score to obtain a fusion score of each of the first brand and the second brand.

12. The apparatus of claim 11, said text libraries comprising at least two, each said text library correspondingly comprising a type of text;

the text processing unit is specifically configured to determine, according to each of the text libraries, a second preset number of second brands with a highest similarity between the text and the text related to the target store, and obtain at least two types of second preset number of second brands;

the text processing unit is specifically further configured to determine, for each type of the second brand, a second score of each of the second brands according to a similarity between the text related to the second brand and the text related to the target store.

13. The apparatus of claim 12, wherein the first score and the second score for each of the first brand and the second brand are summed together with different weights for the second scores for different types of the second brand.

14. The apparatus according to any of claims 10 to 13, wherein the image processing unit is specifically configured to determine a similarity score for each of the first brands based on a similarity between the first brand's signboard images and the target store's signboard images; determining the consistency score of each first brand according to the number of first brands belonging to the same brand in the first preset number of first brands and the first preset number; determining a first score for the first brand based on the similarity score and the consistency score for the first brand.

15. The apparatus according to claim 14, wherein the image processing unit is specifically configured to input the signboard image of the target store into a preset image recognition model, and determine candidate first brands with similar signboard images to the signboard image of the target store and a confidence level of each candidate first brand through the image recognition model, where the image recognition model is trained by using the signboard image library; and taking the confidence degree of the candidate first brands as similarity, and selecting a first preset number of first brands with highest similarity from the candidate first brands.

16. The apparatus according to any of claims 10 to 15, wherein the text processing unit is specifically configured to determine a similarity score for each of the second brands based on a similarity between the text related to the second brand and the text related to the target store; determining the consistency score of each second brand according to the number of the second brands belonging to the same brand in the second preset number of second brands and the second preset number; determining a second score for the second brand based on the similarity score and the identity score for the second brand.

17. The apparatus according to claim 16, wherein the text processing unit is specifically configured to input the text related to the target store into a preset text recognition model, and determine candidate second brands with texts similar to the text related to the target store and a confidence level of each candidate second brand through the text recognition model, where the text recognition model is trained by using the text library; and taking the confidence degrees of the candidate second brands as similarity, and selecting a second preset number of second brands with highest similarity from the candidate second brands.

18. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-9.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.