CN111831928B

CN111831928B - POI (Point of interest) ordering method and device

Info

Publication number: CN111831928B
Application number: CN201910873935.3A
Authority: CN
Inventors: 郑万吉; 陈欢
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2024-06-18
Anticipated expiration: 2039-09-17
Also published as: CN111831928A

Abstract

The application provides a POI (point of interest) ordering method and device; the method comprises the following steps: obtaining geographic position information of a client and search information input by a user through the client; searching based on the search information to obtain a plurality of pieces of candidate POI information corresponding to the search information; determining the value score of each piece of candidate POI information based on the geographic position information, the candidate POI information and a POI value determining model obtained in advance; and sorting the pieces of candidate POI information according to the value scores of the pieces of candidate POI information. According to the method, the value score of each piece of candidate POI information is determined based on the geographic position information of the client, the candidate POI information is ranked based on the value score of each piece of candidate POI information, and the accuracy of POI ranking is improved.

Description

POI (Point of interest) ordering method and device

Technical Field

The application relates to the technical field of information retrieval, in particular to a POI (point of interest) ordering method and device.

Background

Information retrieval is an important place in many fields. In many application scenarios, multiple predicted search results are recalled and presented to a user based on the current character entered by the user. For example, when a user searches interest points (Point of Interest, POIs) through a client of a network taxi, the user inputs each character in a man-machine interaction interface of the client, and the client recalls a search result according to the currently input character; meanwhile, the recalled POIs are excessive in information due to the fact that massive data are faced, therefore, the recalled POIs are required to be ranked according to a preset ranking strategy, and the top N POIs with the top ranking are displayed for a user on a man-machine interaction interface according to the ranking of the POIs.

Current POI ranking strategies are typically based on the relevance of POIs to the retrieved information, which may cause problems with low ranking accuracy.

Disclosure of Invention

Accordingly, the present application is directed to a method and an apparatus for ordering POIs, which can improve the accuracy of ordering POIs for users.

In a first aspect, an embodiment of the present application provides a method for ordering POI, where the method includes:

obtaining geographic position information of a client and retrieval information input by a user through the client;

Searching based on the search information to obtain a plurality of pieces of candidate POI information corresponding to the search information;

Determining a value score of each piece of candidate POI information based on the geographic position information, the candidate POI information and a POI value determining model obtained in advance;

and sorting the pieces of candidate POI information according to the value scores of the pieces of candidate POI information.

In an alternative embodiment, the POI value determination model is trained in the following manner:

Acquiring a plurality of pieces of sample retrieval information formed by at least one input character string, sample geographic position information corresponding to each piece of sample retrieval information, sample POI sets corresponding to each input character string in each piece of sample retrieval information respectively, and operation behaviors corresponding to each input character string in each piece of sample retrieval information respectively; wherein the sample POI set comprises a plurality of sample POIs;

For each piece of sample retrieval information, according to each input character string of the sample retrieval information and sample geographic position information corresponding to the sample retrieval information, constructing sample state information corresponding to each input character string of the sample retrieval information;

Constructing sample action information sets under the state information based on POI sets respectively corresponding to the input character strings in each piece of sample retrieval information;

based on operation behavior information corresponding to each input character string in each piece of sample retrieval information, constructing sample rewarding information corresponding to each sample action information in a sample action information set under each sample state information;

And training to obtain the POI value determination model based on the sample state information, the sample action information sets under the sample state information and the sample rewarding information corresponding to each sample action information in the sample action information sets under the sample state information.

In an optional implementation manner, the training to obtain the POI value determining model based on the sample state information, the sample action information set under each sample state information, and the sample reward information corresponding to each sample action information in the sample action information set under each sample state information includes:

For any sample retrieval information, starting from the last sample state information of the sample retrieval information, taking each sample state information as the current sample state information in turn, and executing the following iterative process:

Determining a cumulative award corresponding to the current sample state information based on sample award information corresponding to each sample action information in the sample action information set under the current sample state information;

Determining current count information corresponding to the current sample state information;

taking each sample action information in the sample action information set corresponding to the current state as current sample action information in turn, and training a cost function according to the current count and the accumulated rewards corresponding to the current sample state information aiming at each piece of current sample action information;

The cost function trained on all sample retrieval information is determined as the POI cost determination model.

In an alternative embodiment, the determining the value score of each piece of candidate POI information based on the geographic location information, the candidate POI information, and a POI value determination model obtained in advance includes:

constructing state information corresponding to the search information based on the search information and the geographic position information;

Constructing action information corresponding to each piece of candidate POI information based on the candidate POI information corresponding to the search information;

And inputting action information corresponding to each piece of candidate POI information and the state information serving as independent values into the POI value determining model aiming at each piece of candidate POI information to obtain value scores corresponding to the pieces of candidate POI information respectively.

In an alternative embodiment, before the ranking of each piece of candidate POI information according to the value score of each piece of candidate POI information, the method further includes:

determining a ranking score of each piece of candidate POI information based on a pre-trained learning ranking LTR model;

The ranking of the candidate POI information according to the value score of the candidate POI information includes:

Determining the total score of each piece of candidate POI information according to the value score of each piece of candidate POI information and the sorting score of each piece of candidate POI information;

And sorting each piece of the candidate POI information based on the total score.

In an alternative embodiment, the obtaining the geographic location information of the client includes:

Obtaining geographic position coordinates of the client;

and determining target area range information to which the geographic position coordinates belong based on the geographic position coordinates and a plurality of pieces of area range information which are determined in advance, and determining the target area range information as the geographic position information.

In an alternative embodiment, the retrieving, based on the retrieving information, obtains a plurality of pieces of candidate POI information corresponding to the retrieving information, including:

word segmentation processing is carried out on the search information, and at least one search keyword corresponding to the search information is obtained;

and searching based on each search keyword to obtain a plurality of pieces of candidate POI information corresponding to the search information.

In an alternative embodiment, the POI ranking method further comprises:

And determining and displaying a preset number of target POI information from the candidate POI information based on the ranking.

In a second aspect, an embodiment of the present application provides a device for ordering POI, where the device includes:

the first acquisition module is used for acquiring geographic position information of the client and search information input by a user through the client;

the second acquisition module is used for searching based on the search information and acquiring a plurality of pieces of candidate POI information corresponding to the search information;

The value score determining module is used for determining the value score of each piece of candidate POI information based on the geographic position information, the candidate POI information and a POI value determining model obtained in advance;

And the ranking module is used for ranking the pieces of the candidate POI information according to the value scores of the pieces of the candidate POI information.

In an alternative embodiment, the method further comprises: the model training module is used for training to obtain the POI value determining model by adopting the following modes:

In an optional implementation manner, the model training module is configured to train to obtain the POI value determining model based on the sample state information, a sample action information set under each sample state information, and sample reward information corresponding to each sample action information in the sample action information set under each sample state information in the following manner:

In an alternative embodiment, the value score determining module is specifically configured to determine a value score of each piece of candidate POI information based on the geographic location information, the candidate POI information, and a POI value determining model obtained in advance in the following manner:

In an alternative embodiment, the method further comprises: a ranking score determining module, configured to determine a ranking score of each piece of candidate POI information based on a pre-trained learning ranking LTR model;

The ordering module is specifically configured to: determining the total score of each piece of candidate POI information according to the value score of each piece of candidate POI information and the sorting score of each piece of candidate POI information;

In an alternative embodiment, the first obtaining module is configured to obtain geographic location information of the client in the following manner:

Obtaining geographic position coordinates of the client;

In an alternative embodiment, the second obtaining module is specifically configured to:

In an alternative embodiment, the method further comprises: and the display module is used for determining and displaying a preset number of target POI information from the candidate POI information based on the sorting.

In a third aspect, an embodiment of the present application further provides a computer apparatus, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect, or any of the possible implementation manners of the first aspect.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the first aspect, or any of the possible implementation manners of the first aspect.

According to the method and the device, the user searches through the search information input by the client to obtain a plurality of pieces of candidate POI information corresponding to the search information, then the value score of each piece of candidate POI information is determined based on the geographic position information of the client, the candidate POI information and the POI value determining model obtained in advance, and each piece of candidate POI is ranked based on the value score of the candidate POI information, so that the accuracy of POI ranking is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a schematic architecture diagram of a service system according to an embodiment of the present application;

fig. 2 shows a flowchart of a POI sorting method according to an embodiment of the present application;

fig. 3 is a flowchart of a specific method for training to obtain a POI value determination model in the POI sorting method provided by the embodiment of the application;

FIG. 4 illustrates a state transition diagram provided by an embodiment of the present application;

FIG. 5 illustrates an example of state transitions provided by an embodiment of the present application;

FIG. 6 is a flowchart of a specific method for determining a value score of each piece of candidate POI information in the POI ranking method according to the embodiment of the application;

Fig. 7 is a schematic structural diagram of a POI sorting device according to an embodiment of the present application;

fig. 8 shows a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

In order to enable those skilled in the art to use the present disclosure, the following embodiments are presented in connection with a specific application scenario "net car". It will be apparent to those having ordinary skill in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. While the present application has been described primarily in the context of ranking candidate POIs retrieved from user-entered retrieval information during a network-based taxi-around platform taxi-around, it should be understood that this is but one exemplary embodiment. The embodiment of the application can also be used in other fields, such as POI (point of interest) position inquiry based on map software.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

One aspect of the application relates to a system for POI ranking. The system can search through search information input by a user through a client to acquire a plurality of pieces of candidate POI information corresponding to the search information, then determine the value score of each piece of candidate POI information based on the geographic position information of the client, the candidate POI information and a POI value determination model obtained in advance, sort each piece of candidate POI based on the value score of the candidate POI information, determine and arrange a preset number of target POI information from the candidate POIs based on the sorting, and improve the accuracy in POI sorting.

It is noted that prior to the application of the present application, POI ranking strategies are typically based on the relevance of POIs to the retrieved information. However, in reality, there are different situations in which the search information input by the user is the same, but the destination POI is different, so that when the user wants to sort the POI information based on the sorting result, the sorted POI information is not the POI wanted by the user in many cases, and the accuracy of POI sorting is relatively low.

Fig. 1 is a schematic architecture diagram of a service system 100 for ordering POIs according to an embodiment of the present application. For example, the service system 100 may be an online transport service platform for a transport service such as a taxi, a ride service, a express, a carpool, a bus service, a driver rental, or a class service, or any combination thereof. The POI ranked service system 100 may include one or more of a server 110, a network 120, a client 130, and a database 140.

In some embodiments, the body of execution of the POI ranking method may be either the server 110 or the client 130.

In some embodiments, if the POI ordering method is performed in the server 110, the server 110 may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described in the present application. For example, the processor may perform POI search based on search information obtained from the client 130, rank the candidate POI information obtained by the search, and rank a preset number of target POI information to the user by controlling the client based on the ranking result. In some embodiments, a processor may include one or more processing cores (e.g., a single core processor (S) or a multi-core processor (S)). By way of example only, the Processor may include a central processing unit (Central Processing Unit, CPU), application Specific Integrated Circuit (ASIC), special instruction set Processor (Application Specific Instruction-set Processor, ASIP), graphics processing unit (Graphics Processing Unit, GPU), physical processing unit (Physics Processing Unit, PPU), digital signal Processor (DIGITAL SIGNAL Processor, DSP), field programmable gate array (Field Programmable GATE ARRAY, FPGA), programmable logic device (Programmable Logic Device, PLD), controller, microcontroller unit, reduced instruction set computer (Reduced Instruction Set Computing, RISC), microprocessor, or the like, or any combination thereof.

In some embodiments, if the POI ranking method is performed in the client 130, the client 130 may send the search information input by the user to the server 110, obtain multiple pieces of candidate POI information returned by the server 110, determine the value score of each piece of candidate POI information, rank each piece of candidate POI information based on the value score of each piece of candidate POI information, and rank a preset number of target POI information to the user based on the ranking result. The device type corresponding to the client 130 may be a mobile device, for example, may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, or an augmented reality device, etc., and may also be a tablet computer, a laptop computer, or a built-in device in a motor vehicle, etc.

In some embodiments, database 140 may be connected to network 120 to communicate with one or more components (e.g., server 110, client 130, etc.) in service system 100. One or more components in service system 100 may access data or instructions stored in database 140 via network 120. In some embodiments, database 140 may be directly connected to one or more components in service system 100, or database 140 may be part of server 110.

The POI ranking method provided by the embodiment of the present application is described in detail below with reference to the description of the service system 100 shown in fig. 1, and taking the execution body as the server 110 as an example.

Referring to fig. 2, a flow chart of a POI sorting method according to an embodiment of the present application is shown, and a specific implementation process includes S201 to S204.

S201: obtaining geographic position information of a client and retrieval information input by a user through the client;

S202: searching based on the search information to obtain a plurality of pieces of candidate POI information corresponding to the search information;

S203: determining a value score of each piece of candidate POI information based on the geographic position information, the candidate POI information and a POI value determining model obtained in advance;

s204: and sorting the pieces of candidate POI information according to the value scores of the pieces of candidate POI information.

The following is a description of the above steps, respectively.

I: in S201, a certain large area is divided into a plurality of area ranges in advance, and area range information corresponding to each area range is generated; for example, a city is divided into a plurality of regional areas, a province is divided into a plurality of regional areas, or a country is divided into a plurality of regional areas; the size of the area range can be set according to actual needs. When the area ranges are divided for different areas, the areas may be divided separately, for example, for adjacent a area and B area, the area ranges may be divided for the a area and the B area respectively; for example, when dividing the area range for the a area, the adjacent a area and B area may be divided into a plurality of area ranges by intersecting the a area and the B area, and dividing the partial area adjacent to the a area from the a area and the B area as the division target.

When the geographic position information of the client is acquired, the geographic position coordinate of the client is acquired firstly based on a positioning technology. And then determining target area range information to which the geographic position coordinates belong based on the geographic position coordinates and a plurality of pieces of area range information which are determined in advance, and determining the target area range information as the geographic position information.

Illustratively, the geographic location information is a number corresponding to the range of the target area.

The positioning techniques used in the present application may be based on global positioning system (Global Positioning System, GPS), global navigation satellite system (Global Navigation SATELLITE SYSTEM, GLONASS), COMPASS navigation system (COMPASS), galileo positioning system, quasi Zenith satellite system (Quasi-Zenith SATELLITE SYSTEM, QZSS), wireless fidelity (WIRELESS FIDELITY, WIFI) positioning techniques, or the like, or any combination thereof. One or more of the above-described positioning systems may be used interchangeably in the present application.

The retrieval information is generally input by a user through a man-machine interaction interface of the client; in some cases, the retrieved information may also be obtained through other means. Taking a user input search information through a man-machine interaction interface as an example, if the execution main body of the POI sorting method is a server, the client sends the search information to the server based on connection with the server after receiving the search information input by the user through the man-machine interaction interface; after receiving the search information, the server performs POI search based on the search information, acquires a plurality of pieces of candidate POI information corresponding to the search information, and sorts the candidate POI information. If the execution subject of the POI ordering method is a client, the client also sends the search information to the server based on the connection with the server after receiving the search information input by the user; after receiving the search information, the server performs POI search based on the search information, acquires a plurality of pieces of candidate POI information corresponding to the search information, and sends the plurality of pieces of candidate POI information to the client; and after receiving the plurality of pieces of candidate POI information, the client orders the candidate POI information.

II: in the step S202, when searching is performed based on the search information to obtain a plurality of pieces of candidate POI information corresponding to the search information, word segmentation processing may be performed on the POI information to obtain at least one search keyword corresponding to the search information; and then searching based on each search keyword to obtain a plurality of pieces of candidate POI information corresponding to the search information.

For example, the search information may be segmented according to different segmentation granularities. Specifically, according to the order of the word segmentation granularity from large to small, determining one word segmentation granularity from the word segmentation granularities as the current word segmentation granularity, segmenting the search information based on the current word segmentation granularity, acquiring at least one search keyword corresponding to the current word segmentation granularity, and searching based on the search keyword corresponding to the current word segmentation granularity to acquire a plurality of candidate POI information corresponding to the current word segmentation granularity.

If the number of the candidate POI information obtained at present does not reach the preset number threshold, the next word segmentation granularity is taken as the current word segmentation granularity, and the step of segmenting the search information based on the current word segmentation granularity to obtain a plurality of candidate POI information corresponding to the current word segmentation granularity is returned.

Or if the number of the candidate POI information which is obtained at present does not reach the preset number threshold, the next word segmentation granularity can be used as the current word segmentation granularity, and the search keywords corresponding to the previous word segmentation granularity are re-segmented based on the current word segmentation granularity, so that a plurality of search keywords corresponding to the current word segmentation granularity are obtained.

III: in S203 described above, the POI value determination model is a relationship model for characterizing the interrelation among the geographical location information, the candidate POI information, and the value score.

Specifically, referring to fig. 3, an embodiment of the present application provides a specific method for training to obtain a POI value determining model, including:

S301: acquiring a plurality of pieces of sample retrieval information formed by at least one input character string, sample geographic position information corresponding to each piece of sample retrieval information, sample POI sets corresponding to each input character string in each piece of sample retrieval information respectively, and operation behaviors corresponding to each input character string in each piece of sample retrieval information respectively; wherein the sample POI set comprises a plurality of sample POIs.

In particular, the input character string refers to a character currently input by a user when inputting search information, and a character string composed of already input characters. When the user inputs the search information, the search information can be input once, for example, the search information to be input by the user is 'M city south station', and four words of 'M city south station' are input to the client at one time; the client sends the M south city station to the server for searching, and obtains a plurality of pieces of POI information matched with the M south city station; at this time, the input character string includes "M southbound station".

The user may input the search information a plurality of times when inputting the search information. For example, the search information to be input by the user is "M city south station"; the user inputs M city for the first time, and the client sends the M city to the server for searching to acquire a plurality of pieces of POI information corresponding to the M city; the user inputs the south station again, the client sends the south station of M city together to the server for searching, and a plurality of pieces of POI information corresponding to the south station of M city are obtained; at this time, the input character string includes "M city" and "M city south station".

Meanwhile, the server stores the input character strings of the search information and POI information corresponding to the input character strings, which is obtained by searching based on the input character strings.

When the sample search information needs to be acquired, the stored search information can be directly used as the sample search information, and the sample POI set corresponding to each input character string in the sample search information can be formed by using the POI information corresponding to each input character string forming the sample search information.

The server receives the search information sent by the client and also receives the geographic position coordinates of the client sent by the client; and the geographic position coordinates are associated with the search information and stored; when sample geographic position information corresponding to the sample retrieval information needs to be obtained, the sample geographic position information can be determined for each sample retrieval information according to geographic position coordinates of the client terminal which are stored in association with the sample retrieval information and a plurality of area range information which are determined in advance.

The operation behavior corresponding to each character string in each piece of sample retrieval information refers to a specific operation behavior given by a user, such as clicking selection, exiting the client, deleting all or part of characters already input, exiting the retrieval information input, and the like, when the server controls the POI information retrieved based on the sample retrieval information to be ordered to the user on the client. The client sends the operation behavior to the server; the server stores the operation behavior and the retrieval information in a correlated way; when the operation behaviors corresponding to the input character strings in each piece of sample retrieval information are required to be acquired, the operation behaviors are only required to be acquired based on the association storage relation.

Illustratively, when the action is a click selection, the server saves the POI of the user's click selection in addition to the click action itself.

Note here that, for the search string that has not been operated by the user, the operation behavior corresponding to the search string is null, and the search string may not be stored.

In view of S301 described above, the method for determining a POI value determining model provided by the embodiment of the present application further includes:

S302: and constructing sample state information corresponding to each input character string of the sample retrieval information according to each input character string of the sample retrieval information and sample geographic position information corresponding to the sample retrieval information aiming at each sample retrieval information.

Example 1: the input string of sample retrieval information includes: "M", "M city south", and "M city south station"; and: the sample geographic position information corresponding to the sample retrieval information comprises: loc1; the corresponding sample state information S1 to S4 are respectively:

S1：(M，loc1)；

S2: (M city, loc 1);

s3: (M south city, loc 1);

s4: (M south city station, loc 1).

Example 2: if the input string of the sample retrieval information includes: "south M city", "south M city station"; and the sample geographic position information corresponding to the sample retrieval information comprises loc1, the corresponding sample state information S1-S2 are respectively:

S1: (M south city, loc 2);

s2: (M south city station, loc 2).

S303: and constructing sample action information sets under the state information based on POI sets respectively corresponding to the input character strings in each piece of sample retrieval information.

Here, the sample action information set is a result of sorting POIs after the client retrieves POI information corresponding to an input character string, which is obtained based on the input character string.

For example, in example 1 above, if the sample POI set corresponding to the input string "M" includes: (POI ₁₁,POI₁₂,……,POI_1m) then the sample action information set corresponding to the state information S1 corresponding to the input string "M" is: (POI ₁₁,POI₁₂,……,POI_1m).

If the sample POI set corresponding to the input string "M city" includes: (POI ₂₁,POI₂₂,……,POI_2n) then the sample action information set corresponding to the state information S2 corresponding to the input string "M city" is: (POI ₂₁,POI₂₂,……,POI_2n).

S304: and constructing sample rewarding information corresponding to each sample action information in the sample action information set under each sample state information based on the operation behavior information corresponding to each input character string in each piece of sample retrieval information.

Here, the bonus information can be used to characterize the likelihood that a user makes a selection of certain sample motion information in the set of sample motion information.

In the training of the POI value determination model, since the operation behaviors respectively corresponding to the input strings in each of the sample search information have been obtained, the probability that the user selects a certain sample action information in the sample action information set at this time has been determined, that is, the probability that the user clicks the selected sample action information is 1, and the probability that the user does not click the selected sample action information is 0, and thus, when the sample reward information respectively corresponding to each sample action information in the sample action information set under each of the sample state information is constructed, the above-described example 1 is exemplified: sample operation information sets corresponding to the sample state information S1, S2, S3, and S4 are:

(POI₁₁,POI₁₂,……,POI_1m)、(POI₂₁,POI₂₂,……,POI_2n)、(POI₃₁,POI₃₂,……,POI_3s)、(POI₄₁,POI₄₂,……,POI_4q).

If: the operation behavior information corresponding to the sample string "M" is null, the operation behavior information corresponding to the sample string "M city south station" is "click select", and POI ₄₃ "is:

Sample rewarding information corresponding to each sample action information in the sample action set under the sample state information S1 corresponding to the M is 0;

Sample rewarding information corresponding to each sample action information in the sample action set under the sample state information S2 corresponding to M market is 0;

Sample rewarding information corresponding to each sample action information in the sample action set under the sample state information S3 corresponding to the M city south is 0;

Sample reward information corresponding to sample action information "POI ₄₃" in the sample action set under the sample state information S4 corresponding to "M south city station" is 1; sample reward information corresponding to the sample action information except the sample action information POI ₄₃ in the sample action set under S4 is also 0.

If: the operation behavior information corresponding to the sample string "M" is null, the operation behavior information corresponding to the sample string "M city south" is null, and the operation behavior information corresponding to the sample string "M city south station" is "exit search information input", then:

Sample reward information corresponding to each sample action information in the sample action set under the sample state information S4 corresponding to "M south city station" is 0.

The execution sequence of S302 to S304 is not provided.

S305: and training to obtain the POI value determination model based on the sample state information, the sample action information sets under the sample state information and the sample rewarding information corresponding to each sample action information in the sample action information sets under the sample state information.

Here, the process of training the POI value determination model may be as follows:

the following reinforcement learning process is iteratively performed:

and sequentially taking each piece of sample retrieval information as current sample retrieval information, taking current sample state information corresponding to the current sample retrieval information and sample action information sets under the sample state information in the current sample retrieval information as independent variables, and taking sample rewarding information corresponding to each piece of current sample action information in the current sample action information sets under the current sample state information as dependent variables to train the POI value determination model.

Specifically, reinforcement learning is a general learning framework for timing decisions, which allows machines to gradually select appropriate operations by letting the machines select the operations and the environment interact and get corresponding rewards from the environment. The environment is sensed by the Agent interacting with the environment, and the Action is selected to obtain the maximum cumulative prize, and the interaction interface of the Agent and the environment includes actions (Action, i.e., the sample Action information), prizes (Reward, i.e., the sample prize information), and states (State, i.e., the sample State information).

The reinforcement learning process can be seen as a markov decision process, the purpose of which is to find an optimal strategy. Reinforcement learning can be simply modeled by a < A, S, R, P > quadruple, A represents an Action sent by an Agent, state is the world State perceived by the Agent, reward is a real value representing rewards or penalties, and P is the world interacted by the Agent.

The influence relationship between the < a, S, R, P > quaternions is as follows:

action space: a, i.e. all actions a constitute an Action space.

STATE SPACE: s, i.e. all states S constitute a state space STATE SPACE.

Reward: s×a '- > R, that is, in the current state S, after the action a is executed, the current state becomes S', and the prize R corresponding to the action a is obtained.

Transition: s— > S ', i.e., in the current state S, after the action a is performed, the current state becomes S'.

P: is a probability matrix of state transitions that represents the magnitude of the probability of the occurrence of a next action after the occurrence of a certain action.

The above-mentioned policy refers to a mapping of states to actions, and the policy is often represented by a symbol, which is a distribution over a set of actions given a state s, namely:

π(a|s)＝p[A_t＝a|S_t＝s]；

the purpose of its learning is to select the appropriate action to maximize the discounted future jackpot (i.e., cost function):

Where γ is a discount factor, and γ is used to balance the importance of the current prize and the long-term prize, and is also used to avoid infinity of the calculation result. Since the policy pi is random and thus the jackpot is random, in order to evaluate the value of state s, a certain amount needs to be defined to describe the value of state s, and then the expected value of the jackpot can be used as the value of state s, when the agent adopts the policy, the jackpot obeys a distribution, and the expected value of the jackpot at state s is defined as a state-value function, namely, the POI value determining model in the embodiment of the present application:

accordingly, the state-behavior value function is:

From the expression of the cost function, a state-based recurrence of the cost function can be derived, such as for state cost function v _π(s), which can be derived:

v_π(s)＝E_π(R_t+1+γR_t+2+γ²R_t+3+...|S_t＝s)

＝E_π(R_t+1+γ(R_t+2+γR_t+3+...)|S_t＝s)

＝E_π(R_t+1+γG_t+1|S_t＝s)

＝E_π(R_t+1+γv_π(S_t+1)|S_t＝s)

That is, the state S _t at time t and the state S _t+1 at time t+1 satisfy the recurrence relation, that is:

v_π(s)＝E_π(R_t+1+γv_π(S_t+1)|S_t＝s)；

this recursive sub is called the bellman equation, which represents that the value of a state is composed of a combination of rewards for that state and the value of the subsequent state in a certain decay ratio.

Similarly, the bellman equation for the action cost function q _π (s, a) can be derived:

q_π(s,a)＝E_π(G_t|S_t＝s,A_t＝a)＝E_π(R_t+1+γq_π(S_t+1,A_t+1)|S_t＝s,A_t＝a);

With the bellman method described above, the action cost function q _π (s, a) can be represented from the state cost function v _π(s), namely:

it can be seen that the state action value consists of two parts added together, the first part being the immediate prize and the second part being the probability of all possible next states of the environment being present multiplied by the state value of that next state, and finally summed together and attenuated.

The reinforcement learning process is an iterative process, in which an optimal strategy is found, and finding an optimal strategy can be performed by finding an optimal state cost function or an action cost function, i.e., finding an optimal action cost function q _π (s, a) under a corresponding different strategy, so that the optimal action cost function is the largest of numerous action state cost functions generated under all strategies.

When training the POI value determination model, the Agent is a search engine, the Action is to give an ordered list, the State is an input character string input by a user, and the reward Reward is 1 when the user clicks and selects a sample POI displayed to the user based on the input character string, and the other points are 0. As shown in fig. 4, fig. 4 is a state transition diagram provided by an embodiment of the present application, where each node in fig. 4 is represented by an input string and corresponding geographic location information; for example, the state information corresponding to the first node includes: (query 0, loc 1); the state information corresponding to the second node includes: (query 1, loc 1); … …; and so on, i.e. each node represents a state and corresponds to a state information, edges between nodes represent transition relations between input strings, weights of the edges are taken as transition probabilities, each node corresponds to a plurality of sample POIs obtained by retrieval, for example, the node with the state information (que, l 1), and the corresponding sample POIs comprise: sample bonus information (POI 1, POI2, POI3, … …, POIn) corresponding to these POIs is R1, R2, R3, … …, rn, respectively.

Therefore, for each inputted string of the user, the ranking result of the POI can be obtained through the above-mentioned process, for example, the user inputs the inputted string of the POI as "M city" first, at this time, a ranking result can be obtained through the above-mentioned process and displayed to the user, if there is no POI that the user wants to select in the ranking result, the user inputs "nan-station" next, at this time, the inputted string of the POI as "M city nan-station" is also obtained through the above-mentioned process and displayed to the user, and during the process of inputting the inputted string of the user, the corresponding POI ranking result can be obtained through the above-mentioned process in real time based on the input of the user, so that the user can obtain the desired target POI in the POI ranking result.

Referring to fig. 5, if geographic location information of a client used by a user is loc1, when an input string input by the user is "M", a plurality of POIs are recalled, and when the input string input by the user is "M city", a plurality of POIs are recalled, so that "M, loc1" - > "M city, loc1" can be represented as a transition of two states, and during training, based on the input string input by the user and the geographic location information of the client as states, then by taking a different POI ranking list obtained by the user as an action in each state, the reward information of the user for each POI is calculated, thereby training a POI value determining model until an optimal model parameter is found, so that cumulative rewards of POIs ranked at the forefront in the POI ranking list are maximum, and at this time, the POI value determining model training is completed.

In the implementation process, the POI value determining model is trained through a large amount of sample data, so that the optimal POI value determining model can be obtained in the training process, and further, when the POI value determining model is used, a better output result can be obtained through the POI value determining model.

In the POI value determination model training process, the following process may be employed:

and determining a current count corresponding to the current sample state information;

And taking each sample action information in the sample action information set corresponding to the current state as current sample action information, and training a cost function according to the current count and the accumulated rewards corresponding to the current sample state information aiming at each piece of current sample action information.

The process is as follows:

initializing all states S and initializing a cost function Q;

And sequentially taking the sample retrieval information as current sample retrieval information, starting from the last state information of the current sample retrieval information, and calculating the cumulative rewards and the cost functions under each state information according to the rewards information actually obtained under the state information, so as to calculate the corresponding cost functions for each state-behavior and rewards.

For any sample retrieval information, when sample state information, sample action information set and sample rewarding information corresponding to the sample retrieval information are expressed as: s ₀,A₀,R₁,…,S_T-1,A_T-1,R_T.

Wherein S0 represents state information corresponding to the 0 th state; a0 represents a sample action information set corresponding to the 0 th state; r ₁ represents a set formed by sample rewarding information corresponding to each sample action information in the sample action information set corresponding to the 0 th state; … …; s _T-1 represents state information corresponding to the T-1 and the state; a _T-1 represents a sample action information set corresponding to the T-1 state; r _T represents a set of sample reward information corresponding to each sample action information in the sample action information set corresponding to the T-1 state.

Initializing the jackpot G to 0; let the count information W be 1; and the count information C is initialized to 0.

Starting from the last state of the sample retrieval information, each state is taken as the current state in turn, namely let t=t-1, T-1, … … 0 in turn, and the following iterative process is executed:

G=γ×g+r _t+1; g is a cumulative award corresponding to the current state information;

C (S _t,A_t)＝C(S_t,A_t) +W; wherein C (S _t,A_t) represents current count information.

Taking each sample action information in the sample action information set corresponding to the current state as the current sample action information in turn, and executing for any one current sample action information poi _k:

Both W, C (S _t,A_t) and G are known, and then the valence function Q (S _t,poi_k) can be solved to complete the training of the valence function.

Multiple training of the value function using multiple sample retrieval information results in a POI value determination model.

After the POI value determining model is obtained, the value score of each piece of candidate POI information can be obtained based on the POI value determining model.

Specifically, referring to fig. 6, the embodiment of the present application further provides a specific method for determining a value score of each piece of candidate POI information, including:

s601: and constructing state information corresponding to the search information based on the search information and the geographic position information.

For example, if the search information is "M city", and the geographical location information is loc3, the state information corresponding to the search information is (M city, loc 3).

S602: and constructing action information respectively corresponding to each piece of candidate POI information based on the candidate POI information corresponding to the retrieval information.

The action information corresponding to each piece of candidate POI information is the candidate POI information.

S603: and inputting action information corresponding to each piece of candidate POI information and the state information serving as independent values into the POI value determining model aiming at each piece of candidate POI information to obtain value scores corresponding to the pieces of candidate POI information respectively.

IV: in S204, the candidate POI information may be ranked directly according to the value score of the candidate POI information.

In another embodiment, the candidate POI information may also be ranked in combination with the value score and other factors of the candidate POI information.

For example: before ranking each piece of candidate POI information according to the magnitude of the value score of each piece of candidate POI information, the method further comprises: a ranking score for each piece of candidate POI information is determined based on a pre-trained Learning To Rank (LTR) model.

The LTR model is a supervised learning ordering method, and before ordering the plurality of candidate POI information, a scoring function in the LTR model is adopted to score the correlation degree between each candidate POI information and the retrieval information, so that an ordering score corresponding to each candidate POI information is obtained.

The LTR model is also trained before scoring the correlation between each candidate POI information and the retrieved information using the LTR model. In the training process, a training set is required to be obtained, an LTR learning method is selected, a loss function is determined, and optimization is performed with the minimum loss function as a target, so that relevant parameters of an LTR model can be obtained. In the actual use stage, a plurality of candidate POI information to be ranked is input into the trained LTR model, and a ranking score corresponding to each candidate POI information can be obtained.

The LTR learning method is classified into a single document method, a document pair method, and a document list method. In this embodiment, a document list method is taken as an example to describe the method, which takes all search result lists corresponding to each query as a training sample, trains according to the training sample to obtain an optimal scoring function, and scores each document corresponding to a new query.

The training process for the LTR model includes: acquiring a plurality of first training retrieval information and a plurality of first training candidate POI information obtained based on each first training retrieval information, wherein each first training candidate POI information is marked with a corresponding sorting score, then taking the plurality of first training retrieval information and the plurality of first training candidate POI information as the input of a sorting model, taking the sorting score corresponding to each first training candidate POI information as the output of the sorting model, training a scoring function of the sorting model, and obtaining a trained sorting model when the scoring function meets the training completion requirement.

It will be appreciated that the training process is described above, and when the scoring function is closest to the optimal function, it means that the scoring function meets the training completion requirements, and the training is completed. Therefore, when the method is used, the scoring function obtained in the training process can be directly used for scoring the correlation degree between each candidate POI information and the retrieval information, and the ranking score corresponding to each candidate POI information is obtained, namely the ranking score represents the correlation degree between the candidate POI information and the retrieval information.

After determining the ranking score of each piece of candidate POI information, determining the total score of each piece of candidate POI information according to the value score of each piece of candidate POI information and the ranking score of each piece of candidate POI information; the pieces of candidate POI information are then ranked based on the total score.

In another embodiment of the present application, after ranking each candidate POI, a preset number of target POI information is determined and displayed from the candidate POI information based on the ranking.

Here, if the execution subject of the method is a server, the server sends the determined POI information to the client; and after receiving the target POI information, the client displays the target POI information for the user on the man-machine interaction interface according to the ordering of the target POI information.

If the execution subject of the method is a client, the client displays the target POI information for the user on a man-machine interaction interface according to the ordering of the target POI information after determining the target POI information.

Based on the same inventive concept, the embodiment of the application also provides a POI sorting device corresponding to the POI sorting method, and since the principle of solving the problem by the device in the embodiment of the application is similar to that of the POI sorting method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Example two

Referring to fig. 7, a schematic diagram of a POI sorting device according to a second embodiment of the present application is shown, where the device includes: a first acquisition module 71, a second acquisition module 72, a value score determination module 73, and a ranking module 74; wherein,

A first obtaining module 71, configured to obtain geographic location information of a client, and search information input by a user through the client;

a second obtaining module 72, configured to perform searching based on the search information, and obtain a plurality of pieces of candidate POI information corresponding to the search information;

A value score determining module 73, configured to determine a value score of each piece of candidate POI information based on the geographic location information, the candidate POI information, and a POI value determining model obtained in advance;

the ranking module 74 is configured to rank each piece of the candidate POI information according to the value score of each piece of the candidate POI information.

In a possible embodiment, the method further comprises: a model training module 75, configured to train to obtain the POI value determining model in the following manner:

In a possible implementation manner, the model training module 75 is configured to train to obtain the POI value determining model based on the sample state information, the sample action information set under each sample state information, and the sample reward information corresponding to each sample action information in the sample action information set under each sample state information in the following manner:

In a possible implementation manner, the value score determining module 73 is specifically configured to determine a value score of each piece of candidate POI information based on the geographic location information, the candidate POI information, and a POI value determining model obtained in advance in the following manner:

In a possible embodiment, the method further comprises: a ranking score determining module 76 for determining a ranking score for each piece of the candidate POI information based on a pre-trained learning ranking LTR model;

In a possible implementation manner, the first obtaining module 71 is configured to obtain the geographic location information of the client in the following manner:

Obtaining geographic position coordinates of the client;

In a possible implementation manner, the second obtaining module 72 is specifically configured to:

In a possible embodiment, the method further comprises: and the display module 77 is used for determining and displaying a preset number of target POI information from the candidate POI information based on the ranking.

The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.

The embodiment of the present application further provides a computer device 80, as shown in fig. 8, which is a schematic structural diagram of the computer device 80 provided in the embodiment of the present application, including: a processor 81, a memory 82, and a bus 83. The memory 82 stores machine-readable instructions executable by the processor 81 (e.g., execution instructions corresponding to the first acquisition module 71, the second acquisition module 72, the value score determination module 73, and the ranking module 74 in the apparatus of fig. 7, etc.), and when the computer device 80 is running, the processor 81 communicates with the memory 82 through the bus 83, and the machine-readable instructions when executed by the processor 81 perform the following processes:

In a possible implementation manner, the POI value determining model is obtained by training the following manner in the instructions executed by the processor 81:

In a possible implementation manner, in the instructions executed by the processor 81, training to obtain the POI value determining model based on the sample state information, the sample action information set under each sample state information, and the sample reward information corresponding to each sample action information in the sample action information set under each sample state information includes:

In a possible implementation manner, in the instructions executed by the processor 81, the determining a value score of each piece of candidate POI information based on the geographic location information, the candidate POI information, and a POI value determination model obtained in advance includes:

In a possible implementation manner, the instructions executed by the processor 81 further include, before the ranking of each piece of candidate POI information according to the magnitude of the value score of each piece of candidate POI information:

In a possible implementation manner, the obtaining the geographic location information of the client in the instructions executed by the processor 81 includes:

Obtaining geographic position coordinates of the client;

In a possible implementation manner, in the instructions executed by the processor 81, the searching based on the search information to obtain a plurality of pieces of candidate POI information corresponding to the search information includes:

In a possible implementation manner, in the instructions executed by the processor 81, the POI sorting method further includes:

The embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by the processor 81, performs the steps of the POI ranking method described above.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when the computer program on the storage medium is run, the POI sorting method can be executed, so that the problem of low accuracy in the way of sorting the POIs based on the relevance in the prior art is solved, and the effect of improving the accuracy of the POIs sorted for users is achieved.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for ordering POI, the method comprising:

ranking each piece of candidate POI information according to the value score of each piece of candidate POI information,

Wherein the POI value determination model is trained by iteratively performing the reinforcement learning process of:

And sequentially taking each piece of sample retrieval information formed by at least one input character string as current sample retrieval information, taking current sample state information corresponding to the current sample retrieval information and sample action information sets under each piece of sample state information in the current sample retrieval information as independent variables, and taking sample rewarding information corresponding to each piece of current sample action information in the current sample action information sets under the current sample state information as dependent variables to train the POI value determination model.

2. The POI ranking method according to claim 1, wherein the POI value determination model is trained by:

3. The POI ranking method according to claim 2, wherein the training to obtain the POI value determination model based on the sample state information, the sample action information set under each sample state information, and the sample reward information corresponding to each sample action information in the sample action information set under each sample state information, respectively, comprises:

4. The POI ranking method as defined in claim 1, wherein the determining the value score of each piece of candidate POI information based on the geographical location information, the candidate POI information, and a pre-obtained POI value determination model comprises:

5. The POI ranking method as defined in claim 1, wherein before ranking each piece of the candidate POI information according to the magnitude of the value score of each piece of the candidate POI information, further comprising:

6. The POI ranking method according to claim 1, wherein the obtaining geographic location information of the client comprises:

Obtaining geographic position coordinates of the client;

7. The POI ranking method according to claim 1, wherein the retrieving based on the retrieval information to obtain a plurality of pieces of candidate POI information corresponding to the retrieval information comprises:

8. The POI ordering method according to claim 1, further comprising:

9. A POI ranking apparatus, comprising:

the ranking module is used for ranking the candidate POI information according to the value score of the candidate POI information; and

A model training module configured to train the POI value determination model by iteratively performing a reinforcement learning process of:

10. The POI sequencing device of claim 9, further comprising: the model training module is used for training to obtain the POI value determining model by adopting the following modes:

11. The POI ranking apparatus according to claim 10, wherein the model training module is configured to train to obtain the POI value determining model based on the sample state information, the sample action information set under each sample state information, and the sample reward information corresponding to each sample action information in the sample action information set under each sample state information in the following manner:

12. The POI ranking apparatus as defined in claim 9, wherein the value score determining module is specifically configured to determine the value score of each candidate POI information based on the geographic location information, the candidate POI information, and a POI value determining model obtained in advance by:

13. The POI sequencing device of claim 9, further comprising: a ranking score determining module, configured to determine a ranking score of each piece of candidate POI information based on a pre-trained learning ranking LTR model;

14. The POI ranking apparatus according to claim 9, wherein the first obtaining module is configured to obtain the geographic location information of the client by:

Obtaining geographic position coordinates of the client;

15. The POI ranking apparatus of claim 9, wherein the second obtaining module is specifically configured to:

16. The POI sequencing device of claim 9, further comprising: and the display module is used for determining and displaying a preset number of target POI information from the candidate POI information based on the sorting.

17. A computer device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the computer device is running, the processor executing the machine-readable instructions to perform the steps of the method of any one of claims 1 to 8.

18. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 8.