CN111831765A - Data processing method and device, electronic equipment and readable storage medium - Google Patents
Data processing method and device, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN111831765A CN111831765A CN202010162248.3A CN202010162248A CN111831765A CN 111831765 A CN111831765 A CN 111831765A CN 202010162248 A CN202010162248 A CN 202010162248A CN 111831765 A CN111831765 A CN 111831765A
- Authority
- CN
- China
- Prior art keywords
- grid
- user
- target
- route
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title abstract description 12
- 239000013598 vector Substances 0.000 claims abstract description 118
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000006399 behavior Effects 0.000 claims description 93
- 238000000034 method Methods 0.000 claims description 42
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000010365 information processing Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 13
- 238000003058 natural language processing Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005295 random walk Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Educational Administration (AREA)
- General Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Remote Sensing (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Navigation (AREA)
Abstract
The embodiment of the invention discloses a data processing method, a data processing device, electronic equipment and a computer readable storage medium, acquiring behavior track information of a plurality of users in a preset area, determining a route statement of each user according to the behavior track information of each user, inputting the route statement of each user into a statement vector determination model for training to acquire feature vectors of a plurality of grids in the preset area, wherein the predetermined area comprises a plurality of grids divided in advance, the route sentence of the user is composed of the mark of a departure grid and a destination grid of at least one riding behavior of the user, therefore, the feature vectors of the grids in the embodiment can better represent the potential relationship among the grids, and further, the efficiency and accuracy of target grid prediction and task execution object selection can be improved based on the feature vector of the embodiment.
Description
Technical Field
The present invention relates to the field of internet technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a readable storage medium.
Background
In the shared transportation service, the riding information of the user is reasonably and effectively processed, for example, other physical location information having a potential relation with the location information in the track is mined according to the riding information of the user, and the like, so that the shared transportation platform can be helped to carry out more optimized and safer operation, and better service is provided for the user.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, an apparatus, an electronic device, and a readable storage medium, so that a feature vector of each grid can better reflect a potential relationship between the grids, and further, when the feature vector is based on this embodiment, efficiency and accuracy of target grid prediction and task execution object selection are higher.
In a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
acquiring behavior track information of a plurality of users in a preset area, wherein the behavior track information of the users comprises a starting grid and a target grid of at least one riding behavior, and the preset area comprises a plurality of pre-divided grids;
determining a route statement of each user according to the behavior track information of each user, wherein the route statement of each user is composed of an identifier of a departure grid and an identifier of a destination grid of at least one riding behavior of the user;
and inputting the route sentences of each user into a sentence vector determination model for training to obtain the feature vectors of a plurality of grids in the predetermined area.
Optionally, determining a route statement of each user according to the behavior trace information of each user includes:
and determining a sentence formed by the identifications of each starting grid and each destination grid in the behavior track information of the user in the preset time as a route sentence of the user.
Optionally, determining a route statement of each user according to the behavior trace information of each user includes:
determining a route sentence of the user according to the continuous riding behavior of the user;
and the route sentence of the user consists of the identifications of the departure grid and the destination grid of the continuous riding behavior.
Optionally, the route statement further includes a preset beginning and end.
Optionally, the statement vector determination model is a Word2Vec model.
Optionally, the plurality of grids of the predetermined area are obtained by equally dividing longitude and latitude.
Optionally, the multiple grids of the predetermined area are obtained by dividing according to place categories or geographic information, and the geographic information includes streets and/or road segments.
Optionally, the method further includes:
and pushing the target grids to the target user according to the feature vectors of the grids in the preset area.
Optionally, pushing the target grid to the user according to the feature vector of each grid in the predetermined area includes:
determining a grid candidate set according to the current input information of the target user, wherein the current input information of the target user comprises distance range information and/or partial information of a target address;
acquiring a feature vector of a grid where the target user is located and a feature vector of each candidate grid in the grid candidate set;
calculating the correlation degree between the feature vector of the grid where the target user is located and the feature vector of each candidate grid;
determining at least one target grid according to the correlation calculation result;
pushing the at least one target grid.
Optionally, the method further includes:
and determining a task execution object corresponding to the target task according to the feature vectors of the grids in the preset area.
Optionally, determining a task execution object corresponding to the target task according to the feature vector of each grid in the predetermined region includes:
acquiring target task information, wherein the target task information comprises identifiers of a starting grid and a target grid;
acquiring a characteristic vector of a starting grid and a characteristic vector of a target grid of the target task;
determining the type of the target task according to the characteristic vector of the starting grid of the target task, the characteristic vector of the target grid and target task additional information, wherein the target task additional information comprises at least one item of target task issuing time, evaluation information of the starting grid of the target task and/or historical tasks corresponding to the target grid;
and distributing the target task to a corresponding task execution object according to the type of the target task.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus includes:
the system comprises a track information acquisition unit, a track information acquisition unit and a track information processing unit, wherein the track information acquisition unit is configured to acquire behavior track information of a plurality of users in a preset area, the behavior track information of the users comprises a starting grid and a destination grid of at least one riding behavior, and the preset area comprises a plurality of grids divided in advance;
a route sentence determination unit configured to determine a route sentence of each of the users according to behavior trajectory information of each of the users, the route sentence of the user being composed of an identification of a departure grid and a destination grid of at least one riding behavior of the user;
a feature vector acquisition unit configured to input route sentences of each of the users into a sentence vector determination model for training processing to acquire feature vectors of a plurality of meshes in the predetermined region.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where the one or more computer program instructions are executed by the processor to implement the method according to the first aspect of the embodiment of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium on which computer program instructions are stored, which when executed by a processor, implement a method according to the first aspect of embodiments of the present invention.
According to the embodiment of the invention, the behavior track information of a plurality of users in a preset area is obtained, the route sentences of the users are determined according to the behavior track information of the users, the route sentences of the users are input into a sentence vector determination model for training treatment, so that the feature vectors of a plurality of grids in the preset area are obtained, wherein the preset area comprises a plurality of grids divided in advance, and the route sentences of the users are composed of the starting grids and the target grids of at least one riding behavior of the users.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 is a schematic diagram of a riding track of a user in the related art;
FIG. 2 is a flow chart of a data processing method of an embodiment of the present invention;
FIGS. 3 and 4 are schematic diagrams of an area grid of an embodiment of the present invention;
FIG. 5 is a schematic diagram of a user behavior trace of an embodiment of the invention;
FIG. 6 is a flow chart of a mesh pushing method of an embodiment of the present invention;
FIG. 7 is a flowchart of a task processing object determination method according to an embodiment of the present invention;
FIG. 8 is a process diagram of a data processing method according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 10 is a schematic diagram of an electronic device of an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
When the riding track information of the user is processed, the traditional method is based on the original map coordinate processing of the riding record, and the method has low flexibility and low efficiency. The coordinate information is difficult to be directly used for training other models, so that the geographic position information is converted into the feature vector information, which is necessary.
In the related art, a sampling map Embedding (Graph Embedding), such as a DeepWalk algorithm, or the like, is used to obtain a vector of geographic coordinate information. In such methods, the riding frequency between two geographic coordinates is often counted, then random walk is performed to generate a plurality of walking paths, and finally the obtained paths are input into a relevant model to train and obtain the feature vectors of the geographic coordinates.
Fig. 1 is a schematic diagram of a riding track of a user in the related art. Fig. 1 shows A, B, C, D riding frequency between four places within a predetermined time (for example, one day), where the riding frequency from a place to B within the predetermined time is 40 riding frequency, the riding frequency from B place to a place within the predetermined time is 20 riding frequency, the riding frequency from B place to C within the predetermined time is 6 riding frequency, the riding frequency from C place to D within the predetermined time is 5 riding frequency, the riding frequency from a place to C within the predetermined time is 1 riding frequency, the riding frequency from D place to a within the predetermined time is 50 riding frequency, and the riding frequency from a place to D within the predetermined time is 30 riding frequency. The following riding paths can be generated by carrying out random walk: A-B-A-C-D-A-D- …, A-C-D-A-B-C- …, D-A-B-C-D-A-D- …, and then how many randomly generated paths are input into a relevant model to train and obtain feature vectors of geographic coordinates. In such methods, the features of the location cannot be well reflected in the randomly generated wandering path, and the noise is much, so that when the acquired feature vectors are applied to a specific scene, for example, address recommendation and the like, the processing efficiency and accuracy are low.
Aiming at the problem of low efficiency and accuracy of the riding track information Processing mode, the embodiment of the invention adopts an NLP (Natural Language Processing) technology to process real riding track information of a user and express geographical coordinate information into the feature vector, so that the obtained feature vector of the address geographical coordinate information can accurately express the potential relationship between physical addresses, the potential relationship between various points is analyzed according to the feature vector, the task Processing efficiency and accuracy in a specific application scene can be improved, for example, the user address recommendation is better realized, and corresponding task Processing objects are better distributed to target tasks.
Fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in fig. 2, an embodiment of the present invention includes the following steps:
step S110, acquiring behavior trace information of a plurality of users in a predetermined area. The preset area comprises a plurality of grids divided in advance, and the behavior track information of the user comprises a departure grid and a destination grid of at least one riding behavior. The predetermined area may be a region, a city, etc., and this embodiment is not limited thereto.
Fig. 3 and 4 are schematic diagrams of an area grid according to an embodiment of the present invention. In an alternative implementation, the plurality of grids of the predetermined area are obtained according to equidistant division of longitude and latitude. As shown in fig. 3, the predetermined area 3 is equally divided into several grids by a longitude (lon) direction difference x and a latitude (lat) direction difference y, wherein each grid has a corresponding identifier (e.g., a grid name or a grid ID).
In another alternative implementation, a plurality of grids of the predetermined area are obtained according to the place category or the geographic information division. The location category may include residential districts, schools, office buildings, subway stations, and the like. The geographic information may include streets, and/or road segments, etc. As shown in fig. 4, the predetermined area 4 is divided into a plurality of meshes according to the location type. Alternatively, when the predetermined area is divided into a plurality of grids according to the location categories, the identification of the grid may be the name of each location, for example, the identification of the grid 41 is xx school. When the predetermined area is divided into a plurality of grids according to the geographical information, the identification of the grids may be the name of each street or road segment.
FIG. 5 is a schematic diagram of a user behavior trace according to an embodiment of the present invention. In this embodiment, a behavior trace of the user M in one day is described as an example, and as shown in fig. 5, the behavior trace of the user M in one day is: grid a-grid b-grid c-grid d-grid a. In one day, the riding behaviors of the user comprise: grid a-grid b, grid c-grid d, and grid d-grid a. In the first ride, the departure grid of the user is grid a and the destination grid is grid b. In the second riding behavior, the departure grid of the user is grid c, the destination grid of the user is grid d, and in the third riding behavior, the departure grid of the user is grid d, and the destination grid of the user is grid a. Optionally, in this embodiment, the information of behavior tracks of the plurality of users in the predetermined area includes the above information.
Step S120, determining a route sentence of each user according to the behavior track information of each user. The route sentence of the user is composed of the mark of the departure grid and the target grid of at least one riding behavior of the user.
In the embodiment, one area is divided into a plurality of grids, and the behavior trace information of the user is converted into a sentence in natural language processing, wherein the mark of the grid is used as a word in the sentence, so that the sentence in the traffic field can be processed through an algorithm model in Natural Language Processing (NLP) to analyze the potential relationship between the departure grid and the destination grid in the behavior trace of the user by using the context analysis processing in the natural language processing.
In an alternative implementation manner, step S120 may specifically determine, as the route statement of the user, a statement composed of identifications of each departure grid and each destination grid in the behavior trajectory information of the user within a predetermined time. Taking the behavior trace of the user M in the day in fig. 5 as an example, one route statement of the user M is (a, b, c, d, a). That is, in the present embodiment, the route sentence can be regarded as a natural language sentence after the word segmentation processing. In an alternative implementation, the route statement further includes a preset beginning B and an end E, and one route statement of the user M is (B, a, B, c, d, a, E). The sentence head B and the sentence tail E are preset information, so that when the route sentence is processed, the departure grid a of the first user behavior has the above information, and the destination grid a of the last user behavior has the below information, so that the processing processes of all grids in the route sentence can be consistent, and the efficiency and the accuracy of data processing are further improved.
In another alternative implementation manner, step S120 may specifically be to determine a route statement of the user according to the continuous riding behavior of the user. The route sentence of the user is composed of the mark of the departure grid and the target grid of the continuous riding behavior of the user. Taking the action track of the user M in the day in fig. 5 as an example, the actions of the user M from grid a to grid b, from grid c to grid d, and from grid d to grid a are riding actions, and the actions from grid b to grid c are non-riding actions. It is easy to conclude that the second ride behavior (grid c to grid d) and the third ride behavior (grid d to grid a) are consecutive ride behaviors of the user M. Therefore, in the present embodiment, the continuous riding behavior of the user for a predetermined time may be determined as one route sentence of the user. Thus, during a day, the route statement of the user M may include (a, b) and (c, d, a). In an alternative implementation, the route statement further includes a preset beginning B and end E, and one route statement of the user M is (B, a, B, E) and (B, c, d, a, E).
Step S130, inputting the route statement of each user into the statement vector determination model for training to obtain the feature vectors of the multiple meshes in the predetermined area. Alternatively, assuming that the route sentences of each user are determined in units of days, route sentences of each user every day in one month (or one week, etc.) may be acquired, and all the acquired route sentences of all the users may be input into the sentence vector determination model to be trained, so as to acquire feature vectors of a plurality of grids in the predetermined area.
In an alternative implementation manner, the statement vector determination model of the present embodiment is a Word2Vec model.
The Word2Vec model is a classic algorithm model in the field of NLP, is a model for training Word vectors (Word Embedding) from texts by constructing the relationship between words and contexts, and is a tool for converting words into vector forms. The context relationship refers to the relationship between the word and the words around the word in the sentence. In this embodiment, the potential relationship between the departure grid and the destination grid of the riding behavior of the user is also referred to.
Word2Vec can simplify the processing of text content into vector operation in K-dimensional vector space through training, and the similarity on the vector space can be used for expressing the similarity on text semantics. word2vec mainly contains two models: skip-word models (skip-grams) and continuous bag of words models (CBOWs). The CBOW model can predict the word itself according to n-1 words around the input, and the Skip-gram model can predict which words around the input word. That is, the input to the CBOW model is the sum of word vectors for n words around a word W, and the output is the word vector for the word W itself, while the input to the Skip-gram model is the word W itself, and the output is the word vector for n words around the word W. Wherein n is greater than 1. Thus, in the present embodiment, the feature vectors of the respective meshes are acquired by inputting the route sentence of the user into Word2Vec for training processing, so that the potential relationship between the meshes can be analyzed from the feature vectors of the respective meshes, and the destination of the user can be predicted based on the potential relationship between the respective meshes and the current address of the user. It is easy to understand that the more route statements a grid is in, the more information is obtained when analyzing its potential relationship with other grids.
It should be understood that other models for training and obtaining word vectors based on context information, such as Doc2Vec, may also be applied in the embodiments of the present invention, and the embodiments of the present invention do not limit this.
According to the embodiment of the invention, the behavior track information of a plurality of users in a preset area is obtained, the route sentences of the users are determined according to the behavior track information of the users, the route sentences of the users are input into a sentence vector determination model for training treatment, so that the feature vectors of a plurality of grids in the preset area are obtained, wherein the preset area comprises a plurality of grids divided in advance, and the route sentences of the users are composed of the starting grids and the target grids of at least one riding behavior of the users.
In an optional implementation manner, the data processing method according to the embodiment of the present invention further includes:
and pushing the target grids to the target user according to the feature vectors of the grids in the predetermined area.
Fig. 6 is a flowchart of a mesh pushing method according to an embodiment of the present invention. As shown in fig. 6, in the present embodiment, the method for pushing the target mesh according to the feature vectors of each mesh in the predetermined area includes the following steps:
step S210, determining a grid candidate set according to the current input information of the target user. The current input information of the target user comprises distance range information and/or partial information of a target address. Taking the shared transportation platform as an example, when a user inputs a destination, a character "square" is input first, and then a place containing the character "square" in the area is screened first to determine a grid candidate set. If the user also pre-selects a distance range, for example 5km, the locations containing the "square" words within 5km are filtered to determine a grid candidate set. Alternatively, when the user selects only the distance range in advance, the grid candidate set is determined according to the commonly used grids (for example, shopping malls and the like) in the distance range.
Step S220, obtain the feature vector of the grid where the target user is located and the feature vector of each candidate grid in the grid candidate set. The feature vectors of each grid are obtained in advance by the embodiment in fig. 2, and are not described herein again.
Step S230, calculating the correlation between the feature vector of the grid where the target user is located and the feature vector of each candidate grid. In this embodiment, the feature vectors of the grids are obtained according to the real riding track of each user and the Word2Vec model, so that the potential relationship between the grids can be analyzed according to the feature vectors of the grids, and the correlation between the grids is calculated based on the potential relationship, which is easy to understand that the higher the correlation is, the most likely to be the destination to be reached by the user. In an alternative implementation, the correlation between meshes may be characterized by calculating the similarity (e.g., cosine distance, etc.) between meshes. In another alternative implementation manner, the correlation between the feature vector of the grid where the target user is located and the feature vector of each candidate grid may be determined through a predetermined correlation model. Optionally, the correlation model is obtained according to the historical behavior track information of the user and the feature vector training of each grid.
Step S240, determining at least one target mesh according to the correlation calculation result. Optionally, the N candidate grids with the highest correlation are determined as the target grids.
Step S250, pushing at least one target grid. That is, at least one target grid is displayed on the interface of the user destination input box, so that the user can select a destination from the target grid when inputting partial information, and therefore, the user experience can be improved.
The embodiment determines a grid candidate set according to the current input information of a target user, acquires a feature vector of a grid where the target user is located and feature vectors of candidate grids in the grid candidate set, calculates the correlation between the feature vector of the grid where the target user is located and the feature vectors of the candidate grids, determines and pushes at least one target grid according to the calculation result of the correlation, wherein the feature vector of each grid is obtained based on the route sentence of the user and the training of the Word2Vec model, therefore, the potential relation between grids can be analyzed more accurately according to the feature vectors of the grids, which can make the correlation calculation between the grids more accurate, therefore, the obtained target grid is more in line with the riding habits of the user, that is, the accuracy of the destination recommended according to the partial information is higher, and the user experience is improved.
In another optional implementation manner, the data processing method according to the embodiment of the present invention further includes:
and determining a task execution object corresponding to the target task according to the feature vector of each grid in the preset area.
Fig. 7 is a flowchart of a task processing object determination method according to an embodiment of the present invention. As shown in fig. 7, in this embodiment, the method for determining a task execution object corresponding to a target task according to a feature vector of each grid in a predetermined area includes the following steps:
step S310, target task information is acquired. The target task information comprises the identifiers of the departure grid and the destination grid.
Step S320, obtaining the feature vector of the departure grid and the feature vector of the destination grid of the target task. The feature vectors of each grid are obtained in advance by the embodiment in fig. 2, and are not described herein again.
And step S330, determining the type of the target task according to the characteristic vector of the starting grid, the characteristic vector of the target grid and the additional information of the target task. The target task additional information comprises at least one item of evaluation information of historical tasks corresponding to target task issuing time, departure grids and/or target grids of the target task. The evaluation information may include evaluation information of the historical task after the task execution object executes the historical task. For example, after a driver executes a task with a departure grid being grid a and a destination grid being grid b, the driver evaluates the task. Alternatively, the task types may include intoxicated tasks, long distance tasks, and the like.
In an optional implementation manner, the feature vector of the departure grid, the feature vector of the destination grid, and the target task additional information of the target task are input into a pre-trained task type determination model for processing, so as to obtain the type of the target task. And training and acquiring a task type determination model according to the feature vectors of the grids and the additional information of the historical task.
And step S340, distributing the target task to the corresponding task execution object according to the type of the target task. Taking a taxi taking task as an example, if the type of the target task is an intoxicated task, the task can be distributed to drivers with rich experience, better body state or better character due to uncertainty of behaviors of intoxicated passengers, and if the type of the target task is a long-distance task, the task is distributed to drivers with better physical strength.
In this embodiment, a feature vector of a starting grid and a feature vector of a target grid of a target task are obtained by obtaining the target task, a type of the target task is determined according to the feature vector of the starting grid, the feature vector of the target grid and additional information of the target task, and the target task is allocated to a corresponding task execution object according to the type of the target task. The feature vectors of the grids are obtained based on the route sentences of the user and the Word2Vec model training, so that the potential relation between the grids can be analyzed more accurately according to the feature vectors of the grids, the predicted task type of the target task can be more accurate, the target task can be more reasonably matched with the task execution object, and the experience of the user and the task execution object can be improved.
It should be understood that the application of the feature vectors of each grid of fig. 6 and 7 is exemplary, and the present embodiment does not limit the application field of the feature vectors of each grid in the region.
Fig. 8 is a process diagram of a data processing method according to an embodiment of the present invention. As shown in fig. 8, in the present embodiment, behavior trace information 81 of a plurality of users in a predetermined area is first acquired. In the present embodiment, each grid is obtained by equally dividing longitude and latitude, and the behavior locus information of users x1, x2, x3, and x4 in one day is taken as an example for explanation. The behavior trace of user x1 is: grid 1-grid 2-grid 8, the behavior trace of user x2 is: grid 1-grid 3-grid 6-grid 5, the behavior trace of user x3 is: grid 4-grid 2-grid 6, the behavior trace of user x4 is: grid 4-grid 5-grid 9-grid 7. The behavior track information of the users x1, x2 and x3 is the riding behaviors, the user x4 is the non-riding behavior from the grid 9 to the grid 7, and the other behaviors are the riding behaviors.
Then, route statements of each user are determined according to the behavior track information of each user. The route sentence of the user is composed of the mark of the departure grid and the target grid of at least one riding behavior of the user. In this embodiment, a sentence composed of the identifiers of the departure grids and the destination grids in the behavior trace information of the user within the predetermined time is determined as the route sentence of the user. Thus, the route statements of users x1, x2, x3 and x4 on the day are (B,1,2,8, E), (B,1,3,6,5, E), (B,4,2,6, E), (B,4,5,9,7, E), respectively. In this embodiment, the route statement includes a preset statement beginning B and a preset statement end E, so that when the route statement is processed, the departure grid of the riding behavior of the first user has the above information, and the destination grid of the riding behavior of the last user has the below information, so that the processing processes of the grids in the route statement are consistent, and the efficiency and accuracy of data processing are further improved.
And finally, inputting the route sentences of the users into the sentence Word2Vec for training to obtain the feature vectors of the grids in the preset area. In the embodiment, the feature vectors of the grids are obtained by inputting the route statements of the user into Word2Vec for training, so that the potential relationship between the grids can be analyzed according to the feature vectors of the grids, the destination of the user can be predicted based on the potential relationship between the grids and the current address of the user, or the task type of the current task is analyzed based on the potential relationship between the grids, and the task is allocated to the corresponding task processing object. It is easy to understand that the more route statements a grid is in, the more information is obtained when analyzing its potential relationship with other grids.
According to the embodiment of the invention, the behavior track information of a plurality of users in a preset area is obtained, the route sentences of the users are determined according to the behavior track information of the users, the route sentences of the users are input into a sentence vector determination model for training treatment, so that the feature vectors of a plurality of grids in the preset area are obtained, wherein the preset area comprises a plurality of grids divided in advance, and the route sentences of the users are composed of the starting grids and the target grids of at least one riding behavior of the users.
Fig. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention. As shown in fig. 9, the data processing apparatus 9 of the embodiment of the present invention includes a trajectory information acquisition unit 91, a route sentence determination unit 92, and a feature vector acquisition unit 93.
The trajectory information acquisition unit 91 is configured to acquire behavior trajectory information of a plurality of users in a predetermined area, the behavior trajectory information of the users including a departure grid and a destination grid of at least one riding behavior, the predetermined area including a plurality of grids divided in advance. In an alternative implementation, the plurality of grids of the predetermined area are obtained according to equidistant division of longitude and latitude. In another alternative implementation, a plurality of grids of the predetermined area are obtained according to the place category or the geographic information division. The location category may include residential districts, schools, office buildings, subway stations, and the like. The geographic information may include streets, and/or road segments, etc.
The route sentence determination unit 92 is configured to determine a route sentence of each of the users according to the behavior trace information of each of the users, the route sentence of the user being composed of the identifications of the departure grid and the destination grid of at least one riding behavior of the user. In an alternative implementation manner, the route statement determination unit 92 is further configured to determine a statement composed of identifications of each departure grid and each destination grid in the behavior trace information of the user within a predetermined time as the route statement of the user. In another alternative implementation, the route sentence determination unit 92 is further configured to determine the route sentence of the user according to the continuous riding behavior of the user. The route sentence of the user is composed of the mark of the departure grid and the target grid of the continuous multiplication of the user.
The feature vector acquisition unit 93 is configured to input route sentences of each of the users into a sentence vector determination model for training processing to acquire feature vectors of a plurality of meshes in the predetermined region. In an alternative implementation manner, the statement vector determination model of the present embodiment is a Word2Vec model.
According to the embodiment of the invention, the behavior track information of a plurality of users in a preset area is obtained, the route sentences of the users are determined according to the behavior track information of the users, the route sentences of the users are input into a sentence vector determination model for training treatment, so that the feature vectors of a plurality of grids in the preset area are obtained, wherein the preset area comprises a plurality of grids divided in advance, and the route sentences of the users are composed of the starting grids and the target grids of at least one riding behavior of the users.
Fig. 10 is a schematic diagram of an electronic device of an embodiment of the invention. As shown in fig. 10, the electronic device shown in fig. 10 is a general-purpose data processing apparatus including a general-purpose computer hardware structure including at least a processor 101 and a memory 102. The processor 101 and the memory 102 are connected by a bus 103. The memory 102 is adapted to store instructions or programs executable by the processor 101. Processor 101 may be a stand-alone microprocessor or a collection of one or more microprocessors. Thus, the processor 101 implements the processing of data and the control of other devices by executing instructions stored by the memory 102 to perform the method flows of embodiments of the present invention as described above. The bus 103 connects the above-described components together, and also connects the above-described components to a display controller 104 and a display device and an input/output (I/O) device 105. Input/output (I/O) devices 105 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 105 are coupled to the system through input/output (I/O) controllers 106.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.
These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (14)
1. A method of data processing, the method comprising:
acquiring behavior track information of a plurality of users in a preset area, wherein the behavior track information of the users comprises a starting grid and a target grid of at least one riding behavior, and the preset area comprises a plurality of pre-divided grids;
determining a route statement of each user according to the behavior track information of each user, wherein the route statement of each user is composed of an identifier of a departure grid and an identifier of a destination grid of at least one riding behavior of the user;
and inputting the route sentences of each user into a sentence vector determination model for training to obtain the feature vectors of a plurality of grids in the predetermined area.
2. The method of claim 1, wherein determining the route statement for each of the users according to the behavior trace information of each of the users comprises:
and determining a sentence formed by the identifications of each starting grid and each destination grid in the behavior track information of the user in the preset time as a route sentence of the user.
3. The method of claim 1, wherein determining the route statement for each of the users according to the behavior trace information of each of the users comprises:
determining a route sentence of the user according to the continuous riding behavior of the user;
and the route sentence of the user consists of the identifications of the departure grid and the destination grid of the continuous riding behavior.
4. The method of any one of claims 1-3, wherein the route statement further comprises a preset beginning and end of the statement.
5. The method of claim 1, wherein the statement vector determination model is a Word2Vec model.
6. The method of claim 1, wherein the plurality of grids of the predetermined area are obtained from equidistant divisions of longitude and latitude.
7. The method of claim 1, wherein the plurality of grids of the predetermined area are obtained according to a category of places or a division of geographic information, the geographic information comprising streets and/or road segments.
8. The method of claim 1, further comprising:
and pushing the target grids to the target user according to the feature vectors of the grids in the preset area.
9. The method of claim 8, wherein pushing a target mesh to a user according to the feature vectors of meshes in the predetermined area comprises:
determining a grid candidate set according to the current input information of the target user, wherein the current input information of the target user comprises distance range information and/or partial information of a target address;
acquiring a feature vector of a grid where the target user is located and a feature vector of each candidate grid in the grid candidate set;
calculating the correlation degree between the feature vector of the grid where the target user is located and the feature vector of each candidate grid;
determining at least one target grid according to the correlation calculation result;
pushing the at least one target grid.
10. The method of claim 1, further comprising:
and determining a task execution object corresponding to the target task according to the feature vectors of the grids in the preset area.
11. The method of claim 10, wherein determining the task execution object corresponding to the target task according to the feature vector of each grid in the predetermined area comprises:
acquiring target task information, wherein the target task information comprises identifiers of a starting grid and a target grid;
acquiring a characteristic vector of a starting grid and a characteristic vector of a target grid of the target task;
determining the type of the target task according to the characteristic vector of the starting grid of the target task, the characteristic vector of the target grid and target task additional information, wherein the target task additional information comprises at least one item of target task issuing time, evaluation information of the starting grid of the target task and/or historical tasks corresponding to the target grid;
and distributing the target task to a corresponding task execution object according to the type of the target task.
12. A data processing apparatus, characterized in that the apparatus comprises:
the system comprises a track information acquisition unit, a track information acquisition unit and a track information processing unit, wherein the track information acquisition unit is configured to acquire behavior track information of a plurality of users in a preset area, the behavior track information of the users comprises a starting grid and a destination grid of at least one riding behavior, and the preset area comprises a plurality of grids divided in advance;
a route sentence determination unit configured to determine a route sentence of each of the users according to behavior trajectory information of each of the users, the route sentence of the user being composed of an identification of a departure grid and a destination grid of at least one riding behavior of the user;
a feature vector acquisition unit configured to input route sentences of each of the users into a sentence vector determination model for training processing to acquire feature vectors of a plurality of meshes in the predetermined region.
13. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-11.
14. A computer-readable storage medium on which computer program instructions are stored, which computer program instructions, when executed by a processor, are to implement a method according to any one of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010162248.3A CN111831765B (en) | 2020-03-10 | 2020-03-10 | Data processing method, device, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010162248.3A CN111831765B (en) | 2020-03-10 | 2020-03-10 | Data processing method, device, electronic equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111831765A true CN111831765A (en) | 2020-10-27 |
CN111831765B CN111831765B (en) | 2024-05-31 |
Family
ID=72913486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010162248.3A Active CN111831765B (en) | 2020-03-10 | 2020-03-10 | Data processing method, device, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111831765B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801405A (en) * | 2021-02-22 | 2021-05-14 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device, medium and product for predicting user label |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017167659A (en) * | 2016-03-14 | 2017-09-21 | 株式会社東芝 | Machine translation device, method, and program |
CN108920462A (en) * | 2018-06-29 | 2018-11-30 | 北京奇虎科技有限公司 | Point of interest POI search method and device based on map |
CN109543909A (en) * | 2018-11-27 | 2019-03-29 | 平安科技(深圳)有限公司 | Prediction technique, device and the computer equipment of vehicle caseload |
CN109978243A (en) * | 2019-03-12 | 2019-07-05 | 北京百度网讯科技有限公司 | Track of vehicle planing method, device, computer equipment, computer storage medium |
CN110210604A (en) * | 2019-05-21 | 2019-09-06 | 北京邮电大学 | A kind of terminal device movement pattern method and device |
CN110674419A (en) * | 2019-01-25 | 2020-01-10 | 北京嘀嘀无限科技发展有限公司 | Geographic information retrieval method and device, electronic equipment and readable storage medium |
CN110688435A (en) * | 2018-07-04 | 2020-01-14 | 北京嘀嘀无限科技发展有限公司 | Similar track searching method and system |
-
2020
- 2020-03-10 CN CN202010162248.3A patent/CN111831765B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017167659A (en) * | 2016-03-14 | 2017-09-21 | 株式会社東芝 | Machine translation device, method, and program |
CN108920462A (en) * | 2018-06-29 | 2018-11-30 | 北京奇虎科技有限公司 | Point of interest POI search method and device based on map |
CN110688435A (en) * | 2018-07-04 | 2020-01-14 | 北京嘀嘀无限科技发展有限公司 | Similar track searching method and system |
CN109543909A (en) * | 2018-11-27 | 2019-03-29 | 平安科技(深圳)有限公司 | Prediction technique, device and the computer equipment of vehicle caseload |
CN110674419A (en) * | 2019-01-25 | 2020-01-10 | 北京嘀嘀无限科技发展有限公司 | Geographic information retrieval method and device, electronic equipment and readable storage medium |
CN109978243A (en) * | 2019-03-12 | 2019-07-05 | 北京百度网讯科技有限公司 | Track of vehicle planing method, device, computer equipment, computer storage medium |
CN110210604A (en) * | 2019-05-21 | 2019-09-06 | 北京邮电大学 | A kind of terminal device movement pattern method and device |
Non-Patent Citations (2)
Title |
---|
A. MULLER ET.AL.: "A model-based object following system", 2009IEEE INTELLIGENT VEHICLES SYMPOSIUM, 31 December 2009 (2009-12-31) * |
胡立坤: "基于Hadoop的海量移动对象轨迹数据挖掘技术研究", 中国优秀硕士学位论文电子全文数据库信息科技, no. 4, 15 April 2018 (2018-04-15) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801405A (en) * | 2021-02-22 | 2021-05-14 | 北京嘀嘀无限科技发展有限公司 | Method, apparatus, device, medium and product for predicting user label |
Also Published As
Publication number | Publication date |
---|---|
CN111831765B (en) | 2024-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033011B (en) | Method and device for calculating track frequency, storage medium and electronic equipment | |
EP3318985A1 (en) | Driving route matching method and apparatus and storage medium | |
US6721653B2 (en) | Navigation system, method, and program for automotive vehicle | |
CN112991008B (en) | Position recommendation method and device and electronic equipment | |
CN112344947B (en) | Map matching method, map matching device, electronic equipment and computer readable storage medium | |
CN101218486A (en) | Method, device and system for modeling a road network graph | |
CN110457413B (en) | Method, device and equipment for determining driving direction and storage medium | |
CN107861957B (en) | Data analysis method and device | |
CN112989222B (en) | Position determining method and device and electronic equipment | |
CN106030685A (en) | Map information processing device, map information processing method, and method for adjusting update data | |
Raju et al. | Developing extended trajectory database for heterogeneous traffic like NGSIM database | |
CN112985442A (en) | Driving path matching method, readable storage medium and electronic device | |
CN111831765B (en) | Data processing method, device, electronic equipment and readable storage medium | |
Amores et al. | A proactive route planning approach to navigation errors | |
CN112650794B (en) | Position data processing method, device, electronic equipment and storage medium | |
JP6572672B2 (en) | Route graph generation method, apparatus, and program | |
CN112785393A (en) | Position recommendation method and device and electronic equipment | |
CN112991810B (en) | Parking position determining method and device, storage medium and electronic equipment | |
CN111581306B (en) | Driving track simulation method and device | |
CN112330056B (en) | Route determination method, route determination device, electronic device and computer-readable storage medium | |
CN111613052A (en) | Traffic condition determining method and device, electronic equipment and storage medium | |
JP6289342B2 (en) | Entry / exit route determination device, entry / exit route determination method, and program | |
WO2016031326A1 (en) | Traffic simulation device and traffic simulation system | |
CN115081662A (en) | Data processing method and device, electronic equipment and readable storage medium | |
CN112785392B (en) | Position recommendation method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |