CN111931077A

CN111931077A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN111931077A
Application number: CN202010615622.0A
Authority: CN
Inventors: 张雷; 段航; 杨凯; 苏哲; 胡渭
Original assignee: Hanhai Information Technology Shanghai Co Ltd
Current assignee: Hanhai Information Technology Shanghai Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-11-13
Anticipated expiration: 2040-06-30
Also published as: CN111931077B

Abstract

The invention discloses a data processing method and a data processing device. The method comprises the following steps: respectively determining the POI (point of interest) name corresponding to each historical order and the third positioning position to which each historical order belongs based on the order information of each historical order in the plurality of historical orders; the order information comprises a first positioning position when orders are placed and a second positioning position when orders are handed over; identifying target history orders, wherein the POI names of the target history orders are the same, and a third positioning position of the target history orders meet preset conditions, wherein the third positioning position meets the preset conditions and comprises that the distance between the third positioning positions of any two target history orders is smaller than a first preset threshold value; and determining POI coordinates corresponding to the same POI name based on the first positioning position and the second positioning position of the target history order with the same POI name. The method can improve the accuracy and the coverage rate of the mined POI coordinates and reduce the mining limitation.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

Embodiments of the present invention relate to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

POI (Point of Interest) is the appearance of personalized service demand of users after the geographic information system is developed to a certain stage. The POI information mainly includes information such as name, category, coordinate, classification, and the like. Comprehensive POI information is a condition for enriching a navigation map, timely POI can remind a user of branches of road conditions and detailed information of surrounding buildings, and can also be convenient for searching each place required by the user in navigation, so that the most convenient and unobstructed road is selected for path planning, and therefore, POI coordinates are particularly important.

In the related art, when coordinates of a POI are mined, coordinates of the POI are mainly obtained by using map coding, wherein the map coding may relate to correspondence between an address/place name and the coordinates, and therefore, the coordinates of the POI may be obtained from the map coding by address attributes of the POI.

However, the geocoding needs to be supported by more perfect map data, and the accuracy of the existing self-established geocoding is lower, so that the accuracy of the POI coordinate acquired by the method is lower; in addition, the quality of address information of some POIs is poor, so that it is difficult to obtain POI coordinates from geocoding according to the POI address, and therefore the scheme also has the problem of low coverage rate; in addition, for POIs with similar addresses (for example, two shops on the same street), the same POI coordinates are easily obtained through geocoding, and therefore, the scheme also has the problem of large limitation.

Therefore, the scheme for mining the POI coordinates in the related art generally has the problems of low accuracy, low coverage and large limitation of the POI coordinates.

Disclosure of Invention

The embodiment of the invention provides a data processing method, which aims to solve the problems of low POI coordinate accuracy, low coverage rate and large limitation existing in a scheme for mining POI coordinates in related technologies.

In order to solve the above problem, in a first aspect, an embodiment of the present invention provides a data processing method, including:

respectively determining the POI (point of interest) name corresponding to each historical order and the third positioning position to which each historical order belongs based on order information of each historical order in a plurality of historical orders;

the order information comprises a first positioning position when orders are placed and a second positioning position when orders are handed over;

identifying a target history order which has the same POI name and the third positioning position meeting a preset condition from the plurality of history orders, wherein the third positioning position meeting the preset condition comprises that the distance between the third positioning positions of any two target history orders is smaller than a first preset threshold value;

and determining POI coordinates corresponding to the same POI name based on the first positioning position and the second positioning position of the target history order with the same POI name.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:

the first determining module is used for respectively determining a POI (point of interest) name corresponding to each historical order and a third positioning position to which each historical order belongs based on order information of each historical order in a plurality of historical orders;

the first identification module is used for identifying target historical orders which have the same POI name and meet preset conditions at the third positioning position in the plurality of historical orders, wherein the third positioning position meets the preset conditions and comprises that the distance between the third positioning positions of any two target historical orders is smaller than a first preset threshold value;

a second determining module, configured to determine, based on the first location position and the second location position of the target history order with a same POI name, POI coordinates corresponding to the same POI name.

In a third aspect, an embodiment of the present invention further discloses an electronic device, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processor, and when the processor executes the computer program, the data processing method according to the embodiment of the present invention is implemented.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method disclosed in the present invention.

In the embodiment of the invention, the target historical orders which correspond to the same POI name and are closer to the third positioning positions of any two historical orders are determined from the historical orders, so that the determined target historical orders point to the same POI, and the POI coordinates of the same POI are determined by utilizing the first positioning positions of the multiple target historical orders pointing to the same POI when orders are placed and the second positioning positions of the multiple target historical orders when orders are handed over, and the accuracy of the POI coordinates mined by the POI names can be further improved as the first positioning positions and the second positioning positions are more accurate and the mined POI coordinates are combined with the first positioning positions and the second positioning positions of the multiple target historical orders; in addition, as the geographic position covered by the historical order is wider, the POI coordinates of the position can be excavated by means of the technical scheme of the embodiment of the invention as long as the position of the historical order exists, and the excavation coverage rate of the POI coordinates is further improved; in addition, when the POI coordinates are mined, the method in the embodiment of the invention performs the POI coordinate mining of the POI name by combining the first positioning position and the second positioning position of the plurality of target history orders with the same POI name, and even if the two geographical positions with close distances are different, the method does not have the limitation problem that the two positions with close distances are mined to the same POI coordinate, and conversely, different POI coordinates can be mined based on the POI names for the two geographical positions with close addresses, so that the limitation of the mined POI coordinates is reduced. In addition, due to the fact that timeliness of the order information is stronger, POI coordinates can be mined out more timely based on the order information of the historical orders.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a flow chart of the steps of a data processing method of one embodiment of the present invention;

FIG. 2 is a map schematic of one embodiment of the present invention;

FIG. 3 is a block diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 4 schematically shows a block diagram of a computing processing device for performing a method according to the present disclosure; and

fig. 5 schematically shows a storage unit for holding or carrying program code implementing a method according to the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides a data processing method, as shown in fig. 1, the method may include the following steps:

step 101, respectively determining a POI name corresponding to each historical order and a third positioning position to which each historical order belongs based on order information of each historical order in a plurality of historical orders;

optionally, the order information further includes an order address.

The order type of the historical order may be any text order including order information such as an order address, a first positioning position when the order is placed, a second positioning position when the order is handed over, and the like, such as a take-away order, a taxi taking order, an express delivery order, and the like.

For example, for a take-away order, the order address may be a shipping address set by the order placing user, the first location position when placing the order may be a user location position when placing the order for the order placing user, and the second location position when handing over the order may be a user location position when receiving goods delivered by the delivery person for the order placing user.

For another example, for a taxi order, the order address may be a boarding address set by the order placing user, the first location position when placing the order may be a user location position when placing the order for the order placing user, and the second location position when handing over the order may be a user location position when the order placing user gets on the bus.

In a possible implementation manner, when the order information of the historical order includes an order address but does not include the first positioning location and the second positioning location, the historical order may be hooked with the positioning coordinates of the user of the historical order when placing the order and the positioning coordinates of the user of the historical order when handing over the order, so as to generate an association relationship between the historical order and the set of positioning coordinates.

In this step, mainly based on the order information of the historical order, the POI name of the historical order and the location position to which the historical order belongs are mined.

For the location to which the historical order belongs, although the order information of the historical order may include at least one location, in order to express the location of the historical order, a third location needs to be mined for each historical order.

Step 102, identifying a target history order which has the same POI name and the third positioning position meeting a preset condition from the plurality of history orders;

after the process of step 101, a POI name and a third location position can be mined for each historical order.

Since the number of the history orders is multiple, a target history order corresponding to the same POI name and having the third location position satisfying the preset condition needs to be mined from the multiple history orders.

The number of target history orders mined here is plural.

The third positioning position meeting the preset condition comprises that the distance between the third positioning positions of any two target historical orders is smaller than a first preset threshold value.

That is to say, in this step, a plurality of target history orders that point to the same POI (specifically, the POI names corresponding to the history orders are the same, and the third positioning positions to which any two history orders belong are relatively close to each other) are mainly mined from a plurality of history orders, where the third positioning positions to which any two history orders belong are relatively close to each other: and the distance between the third positioning positions of any two target historical orders is smaller than a first preset threshold value.

Step 103, determining POI coordinates corresponding to the same POI name based on the first location position and the second location position of the target history order with the same POI name.

And the POI names respectively extracted aiming at the target history orders are the same. And each target history order has a first positioning position and a second positioning position, if the first positioning position and the second positioning position of one order are regarded as a positioning position combination, a plurality of target history orders with the same POI name can correspond to a plurality of groups of positioning position combinations, and the POI coordinates corresponding to the POI name can be mined by utilizing the plurality of groups of positioning position combinations.

The first positioning position can be a longitude and latitude coordinate, the second positioning position can be a longitude and latitude coordinate, and correspondingly, the POI coordinate can also be a longitude and latitude coordinate.

Optionally, the order information further includes an order address, and when step 101 is executed, it can be implemented through S201 and S202:

s201, respectively determining a POI name corresponding to each historical order based on the order address of each historical order in a plurality of historical orders;

the POI name corresponding to each historical order can be extracted from the order address by performing word segmentation processing on the order address of each historical order.

S202, determining a third positioning position to which each historical order belongs based on the first positioning position and the second positioning position of each historical order.

When a third positioning location to which a history order belongs is determined, any one of the first positioning location and the second positioning location of the history order can be used as the third positioning location. An intermediate positioning position between the first positioning position and the second positioning position of the historical order can also be used as a third positioning position to which the historical order belongs.

The location to which the historical order belongs, determined based on the order placement location and the order hand-off location of the historical order, may be more accurate.

The execution sequence of S201 and S202 is not limited in the present invention.

In the embodiment of the invention, the POI name corresponding to each historical order can be respectively determined based on the order address of each historical order in a plurality of historical orders, and the accuracy of the mined POI name can be improved because the order addresses of the historical orders are accurate; in addition, the third positioning position to which each historical order belongs can be determined based on the first positioning position and the second positioning position of each historical order, and since the order placing positioning and the handover positioning of each historical order are accurate, the positioning to which the historical order belongs, which is determined based on the first positioning position and the second positioning position, is objective and accurate.

Optionally, when the above S201 is executed, a word segmentation may be performed on the order address of each history order in the plurality of history orders to obtain a word segmentation result, and a POI name corresponding to each history order may be extracted from the word segmentation result.

The order information of each historical order can include an order address, a first positioning position when the order is placed and a second positioning position when the order is handed over. Of course, the order information may also include information such as order placing time, payment method, order remarks, etc.

And respectively segmenting words of the order address of each historical order to obtain word segmentation results. After word segmentation, the order address can be segmented into a plurality of segmented words, and attribute labels of the segmented words are obtained. According to the attribute labels of the participles, POI names which possibly represent a POI can be extracted from the participles obtained through segmentation. A POI may be a store, a mall, a bus stop, an office building, a park, a cell, etc.

In an alternative embodiment, a word segmentation model for structured word segmentation of order addresses may be pre-trained. In the training process, a large amount of first sample data is obtained, and each first sample data comprises a sample word and a labeled attribute label of the sample word. Training a word segmentation model to be trained by adopting a machine learning algorithm based on a large amount of first sample data, taking sample segmentation as input of the word segmentation model to be trained, calculating a loss value according to output of the word segmentation model to be trained and a labeled attribute label of the sample segmentation, determining that training is finished when the loss value is within a preset range, and taking the trained model as the word segmentation model. In implementation, the word segmentation model may adopt a model structure such as BilSTM (Bidirectional Long Short-Term Memory), CRF (Conditional Random Field), and the like.

Optionally, when a word segmentation result is obtained by segmenting a word of an order address, the order address may be input into a pre-trained word segmentation model to obtain word segments and attribute labels of the word segments output by the word segmentation model, and the word segments and the attribute labels of the word segments are used as the word segmentation result. The word segmentation model is obtained through training according to a plurality of first sample data, and the first sample data comprise sample word segmentation and labeling attribute labels of the sample word segmentation.

For example, one order address is six layers of A seats of A building of Zhongguancun road in the Haizu area of Beijing city, and after structured word segmentation, the word segments of ' Beijing city ', ' Haizu area ', ' Zhongguancun road ', ' A building ', ' A seats ' and six layers ' can be obtained. The attribute label of the participle "Beijing City" is "city", the attribute label of the participle "Haihe district" is "region", the attribute label of the participle "Zhongguan village" is "street", the attribute label of the participle "mansion" A "is" POI ", the attribute label of the participle" seat A "is" building ", and the attribute label of the participle" six layers "is" floor ".

In the embodiment of the invention, the word segmentation model is obtained by training based on a large number of sample word segmentations and the labeled attribute labels of the sample word segmentations, so that the word segmentation model can be used for segmenting the order address more accurately and more quickly.

Optionally, the order information further includes a hand-selected address type, and in the POI name extraction process, the POI name extraction may be performed according to the hand-selected address type.

In an alternative embodiment, the correspondence between the address type and the attribute tag may be preset.

For example, when the address type is an office building, the corresponding attribute tags may be "POI" and "building"; when the address type is a cell, the corresponding attribute tag may be "POI" and "floor number"; when the address type is shop, the corresponding attribute tag may be "POI"; when the address type is a mall, the corresponding attribute tag may be "POI"; when the address type is park, the corresponding attribute tag may be "POI"; when the address type is a bus stop, the corresponding attribute tag may be "POI", and so on.

When the POI name corresponding to each history order is extracted from the word segmentation result, the target attribute tag corresponding to the hand-selected address type of each history order can be queried from the preset corresponding relationship between the address type and the attribute tag; and extracting the participles with the attribute labels as the target attribute labels from the word segmentation result, and taking the extracted participles as the POI names corresponding to each historical order.

For example, in practical applications, the form of the order address may include: a hand-selected POI form, a hand-selected POI + handwritten content form, and a handwritten content form. The mode of selecting POI by hand means that the user only selects the address by hand. For the hand-selected POI format, the composition of the order address may include the following three: POI (i.e., extracted POI name), POI + cell/floor/room number, POI + supplemental information (e.g., remark information). The hand-selected POI + handwritten content form means that the user has selected a part of the address by hand and has handwritten a part of the content. For the hand-selected POI + handwritten content form, the composition of the order address may include the following two: POI + building + others, POI + subdescriptions + others. The handwriting content form means that the user only writes the address by handwriting. For the handwritten content form, the composition of the order address may include the following two: POI + true appeal, POI + other information.

In the embodiment of the invention, the POI names corresponding to different address types possibly contain participles corresponding to different attribute tags, so that the corresponding relation between the address types and the attribute tags is set based on the actual situation, the POI names are extracted according to the corresponding relation, the extraction process is simpler and more convenient, and the extraction result is more accurate.

Optionally, in step 103, filtering an abnormal positioning location in the first positioning location and the second positioning location corresponding to the same POI name, and then determining a POI coordinate corresponding to the same POI name based on the filtered first positioning location and the filtered second positioning location corresponding to the same POI name, which may specifically include S301 to S304:

s301, respectively converting the first positioning position and the second positioning position of each target history order with the same POI name into geographic position indexes, and generating an association relation between the positioning positions and the geographic position indexes;

the first positioning position and the second positioning position can be longitude and latitude coordinates, so that the longitude and latitude coordinates can be converted into a geographic position index.

When converting any longitude and latitude geographic coordinate into a geographic position index, the geographic index calculation method can be adopted, and the calculation method includes, but is not limited to, algorithms such as GeoHash, H3, S2 and the like.

The geographical position index can encode longitude and latitude coordinates into a short character string composed of letters and numbers, and the short character string value can be used for indexing and is used for expressing a certain coordinate point or area on a map. Where points that are close to each other on the map can be translated into a geographical location index with the same prefix (e.g., the close points 1 and 2 on the map, which have respective geographical location indices abc123 and abc124, and prefixes abc 12).

Moreover, the geographic position index may represent geographic position coordinates with any precision as long as the character string length of the geographic position index is long enough, wherein the higher the precision of the geographic position index is, the longer the character string length is, the smaller and more precise the region of the geographic position expressed by the character string length is; then the longer the prefixes of the two sets of codes match, the more proximate the geographic locations of the two sites are when the codes (i.e., strings) of the geographic location index are used to determine how far and near between the two sites.

In one example, a GeoHash algorithm may be used to convert each first location and each second location of each target history order with the same POI name into a hash index, i.e., to encode the spatial location as a string.

In this way, each first location position of each target history order is converted into a hash index, and similarly, each second location position of each target history order is also converted into a hash index.

Thus, an association between the positioning location and the geographical location index is generated for the same POI name.

Since the different positioning locations (e.g., the first positioning locations, the second positioning locations, or the first positioning locations and the second positioning locations) of different target history orders may be closer to each other, there is a case where the different positioning locations are associated with the same geographic location index, but there is no case where the same positioning location is associated with different geographic location indexes.

In one example, for example, the following relationships are generated for 5 target history orders (thus a total of 10 positioning locations, i.e., coordinates, to be converted) with the same POI name (e.g., "XX building 11"), via the index conversion described above:

index 1 associates coordinate 1, coordinate 2, coordinate 3, coordinate 4, and coordinate 5;

index 2 associates coordinate 6, coordinate 7, coordinate 8;

index 3 associates coordinates 9 and 10.

S302, identifying a target geographical position index associated with the most positioning positions in the plurality of geographical position indexes corresponding to the same POI name;

in the above example, the POI name "XX building 11" corresponds to the above three indexes, and the target index associated with the largest number of coordinates, i.e., index 1, can be identified.

S303, filtering the positioning position which is not associated with the target geographical position index in the first positioning position and the second positioning position corresponding to the same POI name;

in the above example, the coordinates 6, 7, 8, 9, and 10 that are not associated with the index 1 may be filtered out of the 10 coordinates (i.e., coordinates 1 to 10) corresponding to the POI name "XX building 11", so that the POI name "XX building 11" corresponds to only coordinates 1, 2, 3, 4, and 5.

S304, determining POI coordinates corresponding to the same POI name based on the filtered first positioning position and the filtered second positioning position corresponding to the same POI name.

In the above example, the POI coordinates of the POI name may be determined based on the filtered coordinates 1, 2, 3, 4, and 5 corresponding to the POI name "XX building 11".

The coordinates 6 to 10 are filtered abnormal positioning positions.

In some application scenarios, the filtered outlier locating locations may include any of the following types of locating locations: the first positioning position of placing the order in different places (namely the situation that the first positioning position and the order address of placing the order are at two completely different geographical positions, or the situation that the first positioning position and the second positioning position of handing over the order are at two completely different geographical positions, for example, the situations of different cities, different counties of the same city, different buildings of the same city, and the like, the filtered abnormal positioning position is the first positioning position of placing the order at the moment), and the second positioning position of handing over the order in an abnormal way (for example, the second positioning position of handing over the order by a rider, the situation that the distance is far away from the order address, and the filtered abnormal positioning position is the second positioning position of handing over at the moment).

In the embodiment of the invention, the first positioning position and the second positioning position are respectively converted into the geographical position indexes, and the geographical position indexes are easy to judge the distance between the geographical positions, so that the abnormal first positioning position and/or second positioning position can be accurately filtered based on the geographical position indexes, and the POI coordinates corresponding to the POI name are mined by adopting the filtered first positioning position and second positioning position corresponding to the same POI name, so that the error influence of the abnormal first positioning position and/or second positioning position on the accuracy of the mined POI coordinates can be avoided, and the accuracy of the mined POI coordinates is further improved.

Alternatively, when step 103 or S304 described above is executed, it may be realized by step a1, step a2, and step A3 in this order:

step A1, performing density clustering on the first positioning position and the second positioning position of the target historical order with the same POI name to obtain at least one first clustering cluster;

the close-level clustering object may be a first positioning location and a second positioning location corresponding to the same POI name before or after the filtering.

One historical order has one order address, and the order addresses may be the same or different from one historical order to another. A POI name can be extracted from an order address of a historical order, and the historical order corresponds to a first positioning position and a second positioning position. For the same POI name, it may correspond to multiple historical orders, and thus, the same POI name may correspond to multiple first location positions and multiple second location positions.

In an implementation, a Density Clustering method with Noise (DBSCAN) may be used to perform Density Clustering on the first positioning location and the second positioning location (corresponding to multiple sets of first positioning locations and multiple sets of second positioning locations) corresponding to each target history order with the same POI name in each target history order with the same POI name.

In one example, as shown in fig. 2, the range in which the DBSCAN clustering is performed in this step is a range 11, and each dot (a dot 12) in the range 11 is a first positioning position and a second positioning position having the same POI name (for example, "XX building 11) that need to be clustered after the above-described filtering process (for example, three dots 13 not in the range 11 in fig. 2 are filtered abnormal positioning positions).

DBSCAN is a density-based clustering algorithm. Unlike the partitioning and hierarchical clustering method, which defines clusters as the largest set of density-connected points, it is possible to partition areas with sufficiently high density into clusters and find clusters of arbitrary shape in a spatial database of noise.

Several definitions in DBSCAN are as follows:

e neighborhood: regions within a given object radius e are referred to as e neighborhoods of the object.

Core object: an object is said to be a core object if the number of sample points within a given object Ε neighborhood is greater than or equal to MinPts.

The direct density can reach: for sample set D, if sample point q is within the e neighborhood of p, and p is the core object, then object q is directly density reachable from object p.

The density can reach: for sample set D, given a string of sample points p1, p2... pn, p ═ p1, q ═ pn, object q is density reachable from object p, provided object pi is density reachable directly from pi-1.

Density connection: there is a point o in the sample set D, and if object o to object p and object q are density reachable, then p and q are density linked.

The DBSCAN clustering process is described generally as follows:

for a given neighborhood distance E and neighborhood minimum sample point number MinPts:

(1) traversing all samples, and finding out a set of all core objects meeting the neighborhood distance E;

(2) randomly selecting a core object, and finding out all samples with accessible density to generate a cluster;

(3) removing the density-reachable samples found in (2) from the remaining core objects;

(4) and (4) repeating the steps (2) to (3) from the updated core object set until all the core objects are traversed or removed.

Corresponding to the embodiment of the present invention, the first location position and the second location position of the target history order with the same POI name form a sample set, where one first location position is a sample, and one second location position is also a sample, and the sample set may not distinguish the first location position from the second location position, and both of them are samples.

A2, selecting a first cluster with the largest magnitude, and performing K-Means clustering on the selected first cluster to obtain at least one second cluster;

the first cluster with the largest magnitude can be selected from the first clusters obtained after density clustering. The maximum magnitude level means that the number of sample points in the cluster is maximum. And performing K-Means clustering on the selected first clustering cluster with the maximum magnitude.

K-Means is a distance-based clustering algorithm. The distance is used as an evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity is. The algorithm considers clusters to be composed of closely spaced objects, and therefore targets the resulting compact and independent clusters as final targets.

The K-Means clustering process is roughly as follows:

(1) and randomly selecting K samples from all samples as centroids.

(2) The distance to each centroid is measured for each sample remaining and is assigned to the closest centroid class.

(3) The centroids of the classes that have been obtained are recalculated.

(4) And (4) iterating the steps (2) to (3) until the new centroid is equal to the original centroid or the distance between the new centroid and the original centroid is smaller than a specified threshold, and ending the algorithm.

Corresponding to the embodiment of the invention, the selected first clustering cluster with the largest magnitude forms a sample set, wherein one first positioning position is a sample, and one second positioning position is also a sample.

And A3, selecting the second clustering cluster with the largest magnitude, and taking the centroid of the selected second clustering cluster as the POI coordinate corresponding to the same POI name.

And selecting the second clustering cluster with the maximum magnitude from the second clustering clusters obtained after the K-Means clustering. The maximum magnitude level means that the number of sample points in the cluster is maximum. And taking the centroid of the second clustering cluster with the largest magnitude as the POI coordinate corresponding to the same POI name.

In one example, as shown in fig. 2, the centroid is a dot 12 within the range 11, i.e., the coordinates of the dot 12 are the POI coordinates corresponding to the POI name (e.g., "XX building 11").

For a POI name, if the POI location corresponding to the POI name is determined only according to the first location and the second location corresponding to the history order to which the POI name belongs, the accuracy of the obtained POI coordinates may be low due to the inaccuracy of the first location or the second location of the history order. Therefore, in the embodiment of the present invention, the first positioning location and the second positioning location corresponding to each historical order having the same POI name may be combined to be processed, and the multiple first positioning locations and the multiple second positioning locations corresponding to the same POI name may be clustered to determine a POI coordinate with a higher confidence, so as to weaken the influence of some positioning locations on the obtained POI coordinate, which is inaccurate. In addition, the method of combining density clustering and K-Means clustering can make up the defects of a single clustering mode and further improve the accuracy of a clustering result.

Optionally, after step 103, the method according to the embodiment of the present invention may further include:

104, acquiring candidate POI information of the POI coordinates to be corrected, wherein the candidate POI information comprises candidate POI names, candidate POI coordinates and candidate POI categories;

in some application scenarios, a large amount of high-value POI information exists, and some of the POI information have a coordinate problem (for example, no POI coordinate exists in the POI information, or the POI coordinate in the POI information is not accurate), so that the existing POI information having the coordinate problem cannot be used online. In the embodiment of the present invention, the coordinate of the candidate POI information with the coordinate problem may be updated, so that the existing POI information may be used by various applications.

Therefore, the candidate POI information here is at least one POI information having a coordinate problem (i.e., POI coordinates to be corrected).

For POI information in which no POI coordinate exists in the existing POI information, an initial POI coordinate may be generated based on an address of the POI information, where the initial POI coordinate is a to-be-corrected POI coordinate of the POI information.

Step 105, identifying target POI information in which the candidate POI name is the same as the same POI name and a distance between the candidate POI coordinate and a POI coordinate corresponding to the same POI name is greater than a second preset threshold, wherein the second preset threshold is a threshold matched with the candidate POI category in the target POI information;

wherein the candidate POI information is a large amount of existing POI information requiring correction of POI coordinates.

Through the steps 101 to 103, the corresponding relationship between the POI names and the POI coordinates can be obtained, where the POI names are mined from the history orders, so different history orders can mine different POI names, and the corresponding relationship obtained in the step 103 can be multiple groups, for example, POI name 1 corresponds to POI coordinate 1; POI name 2 corresponds to POI coordinate 2.

Taking the POI name 1 as an example, this step needs to determine which POI coordinates in the candidate POI information (i.e., the target POI information) are corrected by using the POI coordinates 1.

The specific method is that the candidate POI names of the candidate POI information and the POI name 1 may be compared (for example, the texts are compared one by one, or semantic similarity matching is performed), and the target POI information having the candidate POI name identical to the POI name 1 and the distance between the candidate POI coordinate and the POI coordinate 1 larger than a second preset threshold is found from the candidate POI information.

The evaluation criteria with the same POI name may be that the POI names are all the same, for example, building a and building a are the same POI name; the POI names may be the same after name normalization (case and case unity, digital and text unity, etc.), for example, the seat a of mansion a and the seat a of mansion a are the same POI names, and so on.

Therefore, in this step, the target POI information whose name is also POI name 1 and whose coordinate is farther from the POI coordinate 1 can be found from the candidate POI information.

In addition, when a second preset threshold value for comparing with the distance between the POI coordinates is determined, the threshold value corresponding to the candidate POI category in the target POI information may be determined according to the preset correspondence between the POI category and the threshold value, and the threshold value is used as the second preset threshold value for comparison.

The reason is that the distribution density of different types of buildings differs, for example, when the address category is a bus station, the distance between different bus stations is generally not more than 1 km, so that the corresponding threshold value is 1 km for the address category of the bus station (i.e. the bus station corresponds to 1 km);

if the address category is a chain supermarket, a chain fast food shop and the like, the distance between different branch shops of the same store name does not exceed 2 kilometers, and the chain supermarket and the chain fast food shop correspond to a threshold value of 2 kilometers;

as another example, when the address category is office building, then the distance between different office buildings of the same name is at least 10 kilometers, and thus the office building corresponds to a threshold of 10 kilometers.

And 106, updating the candidate POI coordinates in the target POI information into POI coordinates corresponding to the same POI name.

Since the candidate POI coordinates in the target POI information are farther from the newly generated high-confidence POI coordinates 1, it is indicated that the original POI coordinates in the target POI information may have coordinate errors, and therefore, the candidate POI coordinates in the target POI information can be corrected in this step and specifically updated to the POI coordinates 1.

In the embodiment of the invention, after a POI coordinate corresponding to a certain POI name is mined, candidate POI information needing the same name can be found from the existing candidate POI information based on a POI name comparison mode, and distance comparison is carried out on the candidate POI coordinate in the candidate POI information with the same name and the mined latest POI coordinate, if the distance is larger, the original POI coordinate in the candidate POI information with the same name is possibly wrong, the POI coordinate with higher accuracy determined by the embodiment of the invention can be adopted to update the original POI coordinate, so that the existing POI information does not have the coordinate problem any more, and a large amount of coordinate-free high-value POI information can be on-line and corrected by the POI coordinate; in addition, when whether the POI coordinates in the original POI information are wrong or not is evaluated, the distance between the two POI coordinates can be evaluated according to a third preset threshold value corresponding to the POI category in the target POI information, so that the criterion for evaluating whether the POI coordinates are wrong or not is more reasonable, and the POI information with wrong coordinates can be accurately positioned.

The present embodiment discloses a data processing apparatus, as shown in fig. 3, the apparatus includes:

the first determining module 31 is configured to determine, based on order information of each historical order in a plurality of historical orders, a point of interest (POI) name corresponding to each historical order and a third positioning location to which each historical order belongs;

a first identifying module 32, configured to identify, in the multiple historical orders, a target historical order in which the POI names are the same and the third positioning location meets a preset condition, where the third positioning location meeting the preset condition includes that a distance between third positioning locations of any two of the target historical orders is smaller than a first preset threshold;

a second determining module 33, configured to determine, based on the first location position and the second location position of the target history order with a same POI name, POI coordinates corresponding to the same POI name.

Optionally, the order information further includes an order address, and the first determining module 31 includes:

the first determining submodule is used for respectively determining a POI name corresponding to each historical order based on the order address of each historical order in a plurality of historical orders;

a second determining submodule, configured to determine, based on the first positioning location and the second positioning location of each historical order, a third positioning location to which each historical order belongs.

Optionally, the second determining module 33 includes:

a conversion sub-module, configured to convert the first positioning location and the second positioning location of each target history order with the same POI name into geographic location indexes, respectively, and generate an association relationship between a positioning location and a geographic location index;

a first identification submodule, configured to identify, in the multiple geographic position indexes corresponding to the same POI name, a target geographic position index associated with the largest number of positioning positions;

a filtering sub-module, configured to filter, in the first positioning location and the second positioning location corresponding to the same POI name, a positioning location that is not associated with the target geographic location index;

and the third determining submodule is used for determining the POI coordinates corresponding to the same POI name based on the filtered first positioning position and the filtered second positioning position corresponding to the same POI name.

Optionally, the second determining module 33 includes:

the first clustering sub-module is used for carrying out density clustering on the first positioning position and the second positioning position of the target historical order with the same POI name to obtain at least one first clustering cluster;

the second clustering submodule is used for selecting the first clustering cluster with the maximum magnitude and carrying out K-Means clustering on the selected first clustering cluster to obtain at least one second clustering cluster;

and the second identification submodule is used for selecting the second clustering cluster with the maximum magnitude and taking the centroid of the selected second clustering cluster as the POI coordinate corresponding to the same POI name.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring candidate POI information of the POI coordinates to be corrected, wherein the candidate POI information comprises a candidate POI name, candidate POI coordinates and candidate POI categories;

the second identification module is used for identifying target POI information in the candidate POI information, wherein the candidate POI name is the same as the same POI name, and the distance between the candidate POI coordinates and POI coordinates corresponding to the same POI name is greater than a second preset threshold value, and the second preset threshold value is a threshold value matched with the candidate POI category in the target POI information;

and the updating module is used for updating the candidate POI coordinates in the target POI information into POI coordinates corresponding to the same POI name.

Optionally, the first determining sub-module is configured to perform word segmentation on the order address of each historical order in the multiple historical orders to obtain word segmentation results, and extract, from the word segmentation results, a POI name corresponding to each historical order.

Optionally, the first determining sub-module includes:

the input unit is used for inputting the order address into a pre-trained word segmentation model to obtain each word segmentation output by the word segmentation model and the attribute label of each word segmentation, and the word segmentation and the attribute label of each word segmentation are used as word segmentation results;

the word segmentation model is obtained through training according to a plurality of first sample data, and the first sample data comprise sample word segmentation and labeling attribute labels of the sample word segmentation.

Optionally, the first determining sub-module includes:

the query unit is used for querying a target attribute label corresponding to the hand-selected address type of each historical order from a preset corresponding relation between the address type and the attribute label;

and the determining unit is used for extracting the participles with the attribute labels as the target attribute labels from the word cutting results, and taking the extracted participles as the POI names corresponding to each historical order.

The data processing apparatus disclosed in the embodiments of the present invention is configured to implement each step of the data processing method described in each of the above embodiments of the present invention, and for specific implementation of each module of the apparatus, reference is made to the corresponding step, which is not described herein again.

According to the data processing device disclosed by the embodiment of the invention, the target historical orders which have the same corresponding POI name and are closer to the third positioning positions of any two historical orders are determined from the historical orders, so that the determined target historical orders point to the same POI, and the POI coordinates of the same POI are determined by utilizing the first positioning positions of a plurality of target historical orders pointing to the same POI when orders are placed and the second positioning positions of the plurality of target historical orders when orders are handed over, and the accuracy of the POI coordinates mined by the POI names can be further improved as the first positioning positions and the second positioning positions are more accurate and the mined POI coordinates are combined with the first positioning positions and the second positioning positions of the plurality of target historical orders; in addition, as the geographic position covered by the historical order is wider, the POI coordinates of the position can be excavated by means of the technical scheme of the embodiment of the invention as long as the position of the historical order exists, and the excavation coverage rate of the POI coordinates is further improved; in addition, when the POI coordinates are mined, the method in the embodiment of the invention performs the POI coordinate mining of the POI name by combining the first positioning position and the second positioning position of the plurality of target history orders with the same POI name, and even if the two geographical positions with close distances are different, the method does not have the limitation problem that the two positions with close distances are mined to the same POI coordinate, and conversely, different POI coordinates can be mined based on the POI names for the two geographical positions with close addresses, so that the limitation of the mined POI coordinates is reduced. In addition, due to the fact that timeliness of the order information is stronger, POI coordinates can be mined out more timely based on the order information of the historical orders.

Correspondingly, the invention also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the data processing method according to any one of the above embodiments of the invention. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.

The invention also discloses a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the data processing method according to any of the above-mentioned embodiments of the invention.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The data processing method and apparatus provided by the present invention are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a computing processing device according to embodiments of the present disclosure. The present disclosure may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present disclosure may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, FIG. 4 illustrates a computing processing device that may implement methods in accordance with the present disclosure. The computing processing device conventionally includes a processor 1010 and a computer program product or computer-readable medium in the form of a memory 1020. The memory 1020 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 1020 has a storage space 1030 for program code 1031 for performing any of the method steps of the above-described method. For example, the storage space 1030 for program code may include respective program code 1031 for implementing various steps in the above method, respectively. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a portable or fixed storage unit as described with reference to fig. 5. The memory unit may have memory segments, memory spaces, etc. arranged similarly to memory 1020 in the computing processing device of fig. 4. The program code may be compressed, for example, in a suitable form. Typically, the memory unit comprises computer readable code 1031', i.e. code that can be read by a processor, such as 1010, for example, which when executed by a computing processing device causes the computing processing device to perform the steps of the method described above.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Moreover, it is noted that instances of the word "in one embodiment" are not necessarily all referring to the same embodiment.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Claims

1. A data processing method, comprising:

2. The method according to claim 1, wherein the order information further includes an order address, and the determining, based on the order information of each historical order in the plurality of historical orders, a point of interest (POI) name corresponding to each historical order and a third location to which each historical order belongs respectively comprises:

respectively determining a POI name corresponding to each historical order based on the order address of each historical order in a plurality of historical orders;

and determining a third positioning position to which each historical order belongs based on the first positioning position and the second positioning position of each historical order.

3. The method of claim 1, wherein determining the POI coordinates corresponding to the same POI name based on the first and second locations of the target history order having the same POI name comprises:

respectively converting the first positioning position and the second positioning position of each target historical order with the same POI name into geographic position indexes to generate an association relation between the positioning positions and the geographic position indexes;

identifying a target geographical position index associated with the largest number of positioning positions in a plurality of geographical position indexes corresponding to the same POI name;

filtering the positioning position which is not associated with the target geographic position index in the first positioning position and the second positioning position corresponding to the same POI name;

and determining POI coordinates corresponding to the same POI name based on the filtered first positioning position and the filtered second positioning position corresponding to the same POI name.

4. The method of claim 1, wherein determining the POI coordinates corresponding to the same POI name based on the first and second locations of the target history order having the same POI name comprises:

performing density clustering on the first positioning position and the second positioning position of the target historical order with the same POI name to obtain at least one first clustering cluster;

selecting a first cluster with the largest magnitude, and performing K-Means clustering on the selected first cluster to obtain at least one second cluster;

and selecting a second clustering cluster with the largest magnitude order, and taking the centroid of the selected second clustering cluster as the POI coordinate corresponding to the same POI name.

5. The method of claim 1, wherein after determining the POI coordinates corresponding to the same POI name based on the first and second locations of the target history order having the same POI name, the method further comprises:

acquiring candidate POI information of a POI coordinate to be corrected, wherein the candidate POI information comprises a candidate POI name, a candidate POI coordinate and a candidate POI category;

identifying target POI information in the candidate POI information, wherein the candidate POI name is the same as the same POI name, and the distance between the candidate POI coordinates and POI coordinates corresponding to the same POI name is larger than a second preset threshold value, wherein the second preset threshold value is a threshold value matched with the candidate POI category in the target POI information;

and updating the candidate POI coordinates in the target POI information into POI coordinates corresponding to the same POI name.

6. A data processing apparatus, comprising:

7. The apparatus of claim 6, wherein the order information further comprises an order address, and wherein the first determining module comprises:

8. The apparatus of claim 6, wherein the second determining module comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the data processing method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the data processing method of one of claims 1 to 5.