CN113706220A

CN113706220A - User portrait determination, user demand prediction method, and data processing system

Info

Publication number: CN113706220A
Application number: CN202111082435.1A
Authority: CN
Inventors: 丁磊; 郑巧巧
Original assignee: Human Horizons Shanghai Autopilot Technology Co Ltd
Current assignee: Human Horizons Shanghai Autopilot Technology Co Ltd
Priority date: 2021-09-15
Filing date: 2021-09-15
Publication date: 2021-11-26

Abstract

The disclosure provides a user portrait determination method, a user demand prediction method, a determination method of a setting position of a charging pile, a storage medium, and a computer program product. The user portrait determination method includes: obtaining first user data related to a target vehicle; inputting the first user data into a trained first random forest model, and determining a user portrait of a user corresponding to the target vehicle; the trained first random forest model is a model obtained by training in a mode that the longicorn stigma is used for optimizing random forest parameters of an initial random forest model by using second user data related to the target vehicle; the trained first random forest model has globally optimal random forest parameters. The technical scheme of this disclosure can reduce confirm user's demand, fill electric pile or construct the cost of labor of the complexity of user portrait, cost for relevant electric drive vehicle setting in suitable position to improve its accuracy.

Description

User portrait determination, user demand prediction method, and data processing system

Technical Field

The present disclosure relates to the field of computer technology. In particular to a user portrait determination method, a user demand prediction method, a determination method of a setting position of a charging pile, a data processing system and a computer program product.

Background

For vehicle production and sales enterprises, in order to increase the sales volume of vehicles, a user figure for purchasing a target vehicle is often constructed based on historical user data, so as to more specifically promote the target vehicle.

In order to construct a user representation, the approaches taken in the prior art are often: the relevant user data is obtained in the form of a paper or electronic questionnaire, and a user representation is constructed based on the obtained relevant user data.

Disclosure of Invention

The present disclosure provides a user profile determination method, a user demand prediction method, a determination method of a setting position of a charging pile, a data processing system, and a computer program product to reduce the labor cost of determining a user demand, setting a charging pile for a relevant electric-powered vehicle at a suitable position, or constructing a user profile, and to improve the accuracy thereof.

According to a first aspect of the present disclosure, there is provided a user representation determination method, which may comprise the steps of:

obtaining first user data related to a target vehicle;

inputting the first user data into a trained first random forest model, and determining a user portrait of a user corresponding to the target vehicle; the trained first random forest model is a model obtained by training in a mode that the longicorn stigma is used for optimizing random forest parameters of an initial random forest model by using second user data related to the target vehicle; the trained first random forest model has globally optimal random forest parameters.

According to a second aspect of the present disclosure, there is provided a user demand prediction method, which may include:

obtaining third user data related to the target vehicle;

inputting the third user data into a trained second random forest model, and determining the demand information of the target user for the target vehicle; the trained second random forest model is a model obtained by training in a mode that longicorn whiskers optimize random forest parameters of an initial random forest model by utilizing fourth user data related to the target vehicle; the trained second random forest model has globally optimal random forest parameters.

According to a third aspect of the present disclosure, there is provided a method of determining a setting position of a charging pile, the method may include:

acquiring fifth user data related to the target vehicle;

inputting the fifth user data into a trained third random forest model, and determining the setting position information of the charging pile, wherein the trained third random forest model is a model obtained by training in a mode of optimizing random forest parameters of an initial random forest model by using longicorn silk by using sixth user data related to the target vehicle; and the trained third random forest model has globally optimal random forest parameters.

According to another aspect of the present disclosure, there is provided a data processing system comprising: the system comprises a target vehicle, a roadside sensing device and a server;

the target vehicle is used for acquiring vehicle-end data reported by the target vehicle and application operation data operated by a user corresponding to the target vehicle on the target vehicle through a vehicle-mounted sensor; uploading vehicle end data reported by the target vehicle and application operation data operated by a user corresponding to the target vehicle on the target vehicle to a server;

the roadside sensing equipment is used for collecting roadside sensing data related to the target vehicle; uploading roadside perception data related to the target vehicle to a server;

the server is used for acquiring vehicle end data uploaded by the target vehicle and reported by the target vehicle and application operation data operated by a user corresponding to the target vehicle on the target vehicle; acquiring roadside sensing data which is uploaded by the roadside sensing equipment and related to the target vehicle; acquiring user attribute data of a user corresponding to the target vehicle, vehicle configuration data related to the target vehicle and geographic information system data related to the target vehicle; the method in any embodiment of the present disclosure is implemented based on the acquired data.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor alone or in combination with a plurality of processors, implement the method in any of the embodiments of the present disclosure.

According to the technology disclosed by the invention, the user portrait of the user corresponding to the target vehicle can be determined by inputting the first user data related to the target vehicle into the trained first random forest model. Therefore, the user portrait is simpler to construct, and a large amount of labor cost is not needed. And compared with the method that the related user data are obtained in a paper or electronic questionnaire form to construct the user portrait, the method that the first random forest model with the globally optimal random forest parameters is trained is used to determine the user portrait, and the accuracy of the constructed user portrait can be improved.

According to the technology disclosed by the invention, the third user data related to the target vehicle is input into the trained second random forest model, so that the demand information of the target user for the target vehicle can be determined. Therefore, the determination of the demand information is simpler, and a large amount of labor cost is not needed. And relative to the method for acquiring related user data in a form of paper or electronic questionnaires and determining user requirements, the method for determining user requirements by using the second random forest model trained by the globally optimal random forest parameters can improve the accuracy of the determined user requirements.

According to the technology, fifth user data related to the target vehicle are input into the trained third random forest model, and the setting position information of the charging pile can be determined. Thereby make the definite simpler that fills the position of setting up of electric pile, need not to spend a large amount of human costs in a large number. And relative to the mode of obtaining relevant user data through paper or an electronic questionnaire and determining the setting position of the charging pile, the setting position of the charging pile is determined by using a third random forest model with global optimal random forest parameters, and the accuracy of the determined setting position of the charging pile can be improved.

In addition, the random forest model has the advantages of high training speed, high prediction speed and the like in the classification model, so that the random forest model is adopted in the process of user portrait determination, demand information determination and charging pile arrangement position determination. In order to improve the classification precision and efficiency of the random forest models, the corresponding random forest models are obtained by training in a mode that longicorn stigma is used for optimizing the random forest parameters of the initial random forest models.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a user representation determination method according to a first embodiment of the present disclosure;

FIG. 2 is a flow chart of a model training method provided in a first embodiment of the present disclosure;

FIG. 3 is a flow chart of a model training process provided in a first embodiment of the present disclosure;

FIG. 4 is a flow chart of a method for predicting user demand according to a second embodiment of the present disclosure;

FIG. 5 is a flow chart of another model training method provided in a second embodiment of the present disclosure;

fig. 6 is a flowchart of a method for determining a setting position of a charging pile according to a second embodiment of the present disclosure;

FIG. 7 is a flow chart of another model training method provided in a third embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a data processing system provided in a fourth embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

First embodiment

A first embodiment of the present disclosure provides a method for determining a user portrait, and specifically, referring to fig. 1, a flowchart of the method for determining a user portrait is provided for the first embodiment of the present disclosure. The method may comprise the steps of:

step S101: first user data associated with a target vehicle is obtained.

Step S102: inputting first user data into a trained first random forest model, and determining a user portrait of a user corresponding to a target vehicle; the trained first random forest model is a model obtained by training in a mode that the longicorn stigma is used for optimizing random forest parameters of the initial random forest model by using second user data related to the target vehicle; the trained first random forest model has globally optimal random forest parameters.

According to the user portrait determining method provided in the first embodiment of the disclosure, the user portrait of the user corresponding to the target vehicle can be determined by inputting the first user data related to the target vehicle into the trained first random forest model. Therefore, the user portrait is simpler to construct, and a large amount of labor cost is not needed. And compared with the method that the related user data are obtained in a paper or electronic questionnaire form to construct the user portrait, the method that the first random forest model with the globally optimal random forest parameters is trained is used to determine the user portrait, and the accuracy of the constructed user portrait can be improved.

In addition, the random forest model has the advantages of high training speed, high prediction speed and the like in the classification model, so the random forest model is adopted when the user portrait is determined. However, in order to improve the classification accuracy and efficiency of the random forest model, the random forest parameters in the random forest model must be appropriate, the model is trained in a manner that the longicorn must optimize the random forest parameters of the initial random forest model, the characteristics of the initial random forest model can be selected, and the random forest parameters can be optimized. Therefore, the problems of characteristic redundancy and large calculated amount of the model are solved, and the classification precision and efficiency of the random forest model are improved.

It should be noted that before acquiring the user data, authorization permission of the relevant user needs to be obtained, and after the user data is authorized and authorized by the user, the user data of the corresponding user can be acquired.

In a first embodiment of the disclosure, the first user data and the second user data may each include at least one of driving type data of a user corresponding to the target vehicle, offline preference data of the user corresponding to the target vehicle, route data of the user corresponding to the target vehicle, high-frequency location data of the user corresponding to the target vehicle, social radius data of the user corresponding to the target vehicle, configuration preference data of the user corresponding to the target vehicle, usage preference data of the user corresponding to the target vehicle, user group data of the user corresponding to the target vehicle, vehicle configuration data related to the target vehicle, and user attribute data of the user corresponding to the target vehicle.

The first user data and the second user data are both user data related to the target vehicle, and the user refers to a user corresponding to the target vehicle and generally refers to a vehicle owner. And the data type of the first user data and the second user data generally needs to be consistent. The target vehicle may include different vehicles of a pre-selected vehicle type, or may be different vehicles of different vehicle types.

The driving style data may refer to whether the user is an aggressive type or a mild type when driving the vehicle, the aggressive type refers to a behavior such as lane change or overtaking that often occurs when the user drives the vehicle, and the average traveling speed of the vehicle is high, and the mild type refers to a behavior such as lane change or overtaking that does not often occur when the user drives the vehicle, and the average traveling speed of the vehicle is low.

The offline preference data of the user generally refers to a driving behavior preference when driving a vehicle.

The configuration preference data of the user generally includes, but is not limited to, the angle and height configuration preference of the user to the vehicle seat, the configuration preference of the user to the vehicle door opening mode, the man-machine interaction mode preference configured by the user, and the configuration preference of the user to the automatic start and stop function of the vehicle engine. The man-machine interaction mode is as follows: a voice instruction interaction mode, a touch instruction interaction mode and the like.

The usage preference data of the user generally includes, but is not limited to, a user's usage preference for air conditioning of the vehicle, a user's usage preference for volume of a playback device on board the vehicle, and a user's usage preference for lights in the vehicle.

The user group data of the user may be a group divided by income level of the user, such as: high-income groups, medium-income groups, and low-income groups, etc.

The vehicle configuration includes, but is not limited to, attribute configuration such as vehicle type, color, price, etc., power configuration, displacement configuration, safety configuration, etc.

The user attribute data includes, but is not limited to, gender, age group, occupation classification, etc. of the user.

The user profile generally refers to a label that is used for identifying a main audience of a target vehicle and a target user group, and is a virtual representation of a real user constructed by connecting attributes and behaviors of the user with expected data conversion. Specifically, the user portrait may be a high income middle-aged woman who likes to drive a white color car, a white collar who likes to listen to music while driving, a office worker who likes to build a body, or a income group who likes to turn up a seat, and the like. The user figure may be such that 80% of users who have developed a certain vehicle model are males and 20% of users are females.

The user representation may only show the labels of the gender, age, occupation, etc. of the user, only show the labels of the income level, the use preference, the configuration preference, etc. of the user, and simultaneously show the labels of the age, the income level, the use preference, etc. of the user.

In order to reduce the complexity and the amount of calculation of data processing, the dimensions of the user portrait and the labels corresponding to the dimensions that need to be determined by the trained random forest model may be specified in advance. The dimension of the user representation generally refers to the label of the main audience of the target vehicle and the target user group which can be shown by the user representation. If the dimension is one, the dimension may be one of the labels representing the gender, age group or occupation of the user, or one of the income level and the use preference of the user. If the dimension is two, the two tags may represent the gender, the age, and the like of the user at the same time, represent the income level, the use preference, and the like of the user at the same time, or represent the gender, the use preference, and the like of the user at the same time.

Fig. 2 shows training steps of a trained first random forest model in a first embodiment of the present disclosure, and fig. 2 is a flowchart of a model training method provided in the first embodiment of the present disclosure.

Step S201: determining and collecting required initial user data aiming at the trained first random forest model; the initial user data is user data associated with the target vehicle.

Step S202: carrying out data preprocessing on the initial user data to obtain second user data;

step S203: repeatedly and randomly extracting samples in a back-to-back manner in the second user data respectively by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the second user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

step S204: and optimizing random forest parameters of the initial random forest model by adopting the longicorn whiskers based on the training set and the test set so as to obtain a trained first random forest model with optimal random forest parameters.

The collected initial user data is often incomplete and inconsistent dirty data, and generally cannot be directly used for model training, or even if the initial user data is used for model training, the model training speed is slow and the accuracy of a trained model is low. Therefore, in order to ensure the integrity and accuracy of the data used for the model, and to improve the accuracy of the model trained at a relatively slow speed, the data of the collected initial user data needs to be preprocessed.

It should be noted that, in the first embodiment of the present disclosure, the obtaining of the first user data and the obtaining of the second user data are the same, and both the obtaining of the first user data and the obtaining of the second user data are first determining initial user data, and then performing data preprocessing on the initial user data, which is specifically described in detail below by taking the obtaining of the second user data as an example.

In a first embodiment of the present disclosure, the initial user data includes at least one of: roadside perception data related to the target vehicle; vehicle end data reported by a target vehicle; the target vehicle corresponds to application operation data operated by a user on the target vehicle; user attribute data of a user corresponding to the target vehicle; vehicle configuration data relating to the target vehicle; geographic Information System (GIS) data associated with the target vehicle.

Road side perception data road side perception is road traffic participant and road condition data acquired in real time by using various sensors such as a visual sensor, a millimeter wave radar, a laser radar and the like and combining edge computing equipment.

The vehicle-side reported data includes, but is not limited to, vehicle position, alarm, rapid acceleration, braking and other driving data reported by the vehicle, fuel consumption data, vehicle fault data and the like.

The geographic information system data is data related to a driving route and a track of a user corresponding to a target vehicle, which are extracted based on the GIS.

User attribute data includes, but is not limited to, gender, age group, and occupation classification.

Vehicle configuration data includes, but is not limited to, attribute configuration such as vehicle type, color, price, etc., power configuration, displacement configuration, safety configuration, etc.

In the first embodiment of the present disclosure, in order to obtain the second user data based on the initial user data, data preprocessing is required on the initial user data, and the preprocessing includes, but is not limited to, performing at least data cleaning, data classification, feature extraction, and data normalization processing on the initial user data.

Data cleansing includes, but is not limited to, deduplication of data, removing errors, and incomplete data.

The data classification includes dividing initial user data into preset data types according to the usage, attribute, etc. of the data, for example: the initial user data is divided into driving data, trajectory data, habit data, user attribute data, vehicle configuration data, and the like. The driving data can be used for determining driving type data of a user corresponding to the target vehicle, offline preference data of the user corresponding to the target vehicle and inertial walking route data of the user corresponding to the target vehicle; the track data can be used for determining high-frequency place data of a user corresponding to the target vehicle and social radius data of the user corresponding to the target vehicle; the habit data can be used for determining configuration preference data of a user corresponding to the target vehicle and use preference data of the user corresponding to the target vehicle; the user attribute data may be used to determine user attribute data; the vehicle configuration data is used to determine vehicle configuration data associated with the target vehicle.

For the initial user data after data classification, further feature extraction needs to be performed to extract data that can have identification. That is, the initial user data after data classification is converted into data capable of identifying a driving type of a user corresponding to the target vehicle, an offline preference of the user corresponding to the target vehicle, or a used-walking route of the user corresponding to the target vehicle.

Specifically, the driving data is subjected to feature extraction, so that driving type data capable of identifying the driving type of the user corresponding to the target vehicle, offline preference data identifying the offline preference of the user corresponding to the target vehicle, inertial walking route data of the user corresponding to the target vehicle, and the like can be obtained; extracting the characteristics of the track data to obtain high-frequency place data capable of identifying a high-frequency place of a user corresponding to the target vehicle, social radius data identifying the social radius of the user corresponding to the target vehicle and the like; the habit data is subjected to feature extraction, so that configuration preference data capable of identifying configuration preference of a user corresponding to the target vehicle, use preference data identifying use preference of the user corresponding to the target vehicle and the like can be obtained; the user attribute data is subjected to feature extraction, and user attribute data and the like capable of identifying the user attribute of the user corresponding to the target vehicle can be obtained.

Taking the feature extraction of the driving data and the trajectory data to obtain the inertial travel route data of the user corresponding to the target vehicle, the high-frequency location data of the user corresponding to the target vehicle, the social radius data of the user corresponding to the target vehicle, and the like as an example, the process of performing the feature extraction of the driving data and the trajectory data may include but is not limited to: buffer area analysis is combined with superposition analysis, hotspot analysis, geographical weighted regression and geographical clustering analysis.

In the first embodiment of the present disclosure, the implementation manner of determining and collecting the required initial user data for the trained first random forest model is generally as follows:

firstly, determining the dimensionality of a user image determined by a trained first random forest model, and determining a label corresponding to the dimensionality.

And then, searching initial user data to be acquired in a preset initial user data acquisition table based on the label corresponding to the dimension, wherein the initial user data to be acquired corresponding to different labels is loaded in the initial user data acquisition table. For example, if the dimension is one and the tag is the gender of the user, the initial user data to be collected may only be the user attribute data of the user corresponding to the target vehicle; for example, if the dimension is one and the tag is a use habit, the initial user data to be collected may only be application operation data, which is operated on the target vehicle by the user and corresponds to the target vehicle; the following steps are repeated: and the dimension is two, and the label is a social radius and a use habit, the initial user data to be collected is at least application operation data of the target vehicle corresponding to the user to operate on the target vehicle, roadside perception data related to the target vehicle, and geographic information system data related to the target vehicle.

In the first embodiment of the present disclosure, please refer to fig. 3 for a detailed implementation process of steps S203 and S204, and fig. 3 is a flowchart of a model training process provided in the first embodiment of the present disclosure.

Step S301: and determining a training set and a testing set, and constructing an initial random forest model. And constructing different decision trees in the initial random forest model based on the training set, and further constructing and forming the initial random forest model. Specifically, the training set and the test set are determined by repeatedly and randomly extracting M subsets in the second user data in a back-to-back manner respectively by using a self-help Method (Bootstrapping or self-help sampling Method), and using other samples in the second user data as a test set of each decision tree. The implementation mode of constructing different decision trees in the initial random forest model based on the training set is as follows: after each determination, a training set and a testing set are used to construct a single decision tree. The implementation process for constructing and forming the initial random forest model comprises the following steps: the steps of building a single decision tree are repeated x times.

It should be noted that, if each subset has N attributes, when each node of the decision tree needs to be split, N attributes are randomly selected from the N attributes, and the condition N < < N is satisfied. And then adopting an information gain or a Gini index strategy from the n attributes to select 1 attribute as the split attribute of the node.

The N attributes are attributes corresponding to tags pre-arranged for the user data, and the tags pre-arranged for the user data include at least tags corresponding to dimensions of the user representation. For example, if the dimension of the user representation is two, and the tags are social radius and usage habit, the tags pre-configured for the user data at least need to include the social radius and usage habit.

Step S302: initializing random forest parameters that need to be optimized. Initializing a longicorn tentacle length of s, a longicorn motion step length of u and a longicorn iteration number of t_maxThree-dimensional of two tentacles of longicornPosition coordinate vector of P₀＝{P_L，P_RIn which P is_LRepresenting the coordinates of the longicorn left-tentacle position, P_RRepresenting the longicorn right-tentacle position coordinates. Using three-dimensional position coordinate vectors P₀And initializing random forest parameters by coordinate values in the x direction, the y direction and the z direction of the two middle tentacles, wherein the coordinate value in the x direction represents the number of decision trees, the coordinate value in the y direction represents the maximum characteristic number of a single decision tree, and the coordinate value in the z direction represents the minimum leaf node number.

Step S303: and constructing a random forest model of the t iteration. Specifically, the current iteration number is defined as t, and t is initialized to 1; three-dimensional coordinate vector P of longicorn beard₀Three-dimensional coordinate vector P as the t-th iteration_t(ii) a Taking the initial random forest model as a random forest model of the t iteration; three-dimensional coordinate vector P with the t-th iteration_tAnd constructing a random forest model of the t iteration.

Step S304: oob (Out-of-bag data) estimates were made using the test set, resulting in the error fractions. The false score is a false score of a user image from which the random forest model of the t-th iteration is obtained. The error fraction is an unbiased estimation of random forest generalization errors, and the specific steps are as follows: first, oob samples were obtained from a randomly drawn sample without being returned from the test set; then, determining oob classification of the sample in a voting way; finally, the out-of-packet error, i.e., the ratio of the number of misclassified samples to the total number of samples, is calculated.

Step S305: and judging whether the iteration condition is met. That is, it is determined whether t reaches t_maxIf not, the step is not satisfied, and then the steps S306-S307 are executed; otherwise, it is satisfied, at which point steps S308-S309 are performed.

Step S306: and if the iteration condition is not met, moving the longicorn to the next position. Namely, the longicorn moves to the tentacle side corresponding to the local optimal value of the t iteration according to the motion step u of the longicorn, so as to obtain the three-dimensional coordinate vector P of the t +1 iteration_t+1。

Step S307: and obtaining local optimal random forest parameters. Specifically, firstly, the error fraction is used as the fitness value of the t iteration in the longicorn whisker algorithm; secondly, selecting a smaller value of the fitness values of the t iteration corresponding to the left and right longicorn whiskers, taking the smaller value as a local optimal value of the t iteration, and acquiring tentacle coordinates of the longicorn corresponding to the local optimal value; and thirdly, taking the tentacle coordinates of the longicorn corresponding to the local optimal value as local optimal random forest parameters, assigning t +1 to t, and returning to execute the step S303.

Step S308: and obtaining global optimal random forest parameters. Specifically, t is selected_maxTaking the minimum value in the local optimal values of the secondary iteration as a global optimal value; and taking the three-dimensional coordinate vector corresponding to the global optimal value as a global optimal random forest parameter.

Step S309: and constructing a trained first random forest model. Namely, a trained first random forest model is constructed according to the global optimal random forest parameters.

In order to more specifically promote the target vehicles, the sales volume is increased. The first embodiment of the present disclosure provides that after determining the user representation, the potential user corresponding to the target vehicle may be determined according to the user representation of the user corresponding to the target vehicle.

Second embodiment

A user demand prediction method is further provided in the second embodiment of the present disclosure, and specifically, referring to fig. 4, it is a flowchart of a user demand prediction method provided in the second embodiment of the present disclosure. The method may comprise the steps of:

step S401: third user data associated with the target vehicle is obtained.

Step S402: inputting the third user data into the trained second random forest model, and determining the demand information of the target user for the target vehicle; the trained second random forest model is a model obtained by training in a mode that the longicorn stigma is used for optimizing random forest parameters of the initial random forest model by utilizing fourth user data related to the target vehicle; the trained second random forest model has globally optimal random forest parameters.

According to the user demand prediction method provided in the second embodiment of the disclosure, the third user data related to the target vehicle is input into the trained second random forest model, so that the demand information of the target user for the target vehicle can be determined. Therefore, the determination of the demand information is simpler, and a large amount of labor cost is not needed. And relative to the method for acquiring related user data in a form of paper or electronic questionnaires and determining user requirements, the method for determining user requirements by using the second random forest model trained by the globally optimal random forest parameters can improve the accuracy of the determined user requirements.

In addition, the random forest model has the advantages of high training speed, high prediction speed and the like in the classification model, so that the random forest model is adopted in the user demand prediction. However, in order to improve the classification accuracy and efficiency of the random forest model, the random forest parameters in the random forest model must be appropriate, the model is trained in a manner that the longicorn must optimize the random forest parameters of the initial random forest model, the characteristics of the initial random forest model can be selected, and the random forest parameters can be optimized. Therefore, the problems of characteristic redundancy and large calculated amount of the model are solved, and the classification precision and efficiency of the random forest model are improved.

In a second embodiment of the present disclosure, the third user data and the fourth user data may include at least one of high frequency location data of the user corresponding to the target vehicle, social radius data of the user corresponding to the target vehicle, configuration preference data of the user corresponding to the target vehicle, usage preference data of the user corresponding to the target vehicle, and vehicle configuration data related to the target vehicle.

The target user demand information for the target vehicle includes at least one of the following information: so-called target user color demand information for the vehicle; price demand information of the target user for the vehicle; operation demand information of the target user for the vehicle; the vehicle type demand information of the target user for the vehicle; the power configuration demand information of the target user for the vehicle; and the target user is required for the displacement configuration of the vehicle.

The third user data and the fourth user data are both user data related to the target vehicle, and the user refers to a user corresponding to the target vehicle and generally refers to a vehicle owner. And the data type of the third user data and the fourth user data generally needs to be consistent. The target vehicle may include different vehicles of a pre-selected vehicle type, or may be different vehicles of different vehicle types.

According to the social radius data of the target vehicle corresponding to the user, the oil consumption required by the vehicle can be obtained, and therefore the power configuration demand information of the target user on the vehicle and the displacement configuration demand information of the target user on the vehicle are determined. According to the use preference data of the user corresponding to the target vehicle and the vehicle configuration data related to the target vehicle, the color demand information of the target user for the vehicle, the price demand information of the target user for the vehicle, the operation demand information of the target user for the vehicle, the vehicle type demand information of the target user for the vehicle, the power configuration demand information of the target user for the vehicle, the displacement configuration demand information of the target user for the vehicle and the like can be determined.

The target vehicle-related vehicle configuration data includes, but is not limited to, attribute configurations such as vehicle type, color, price, etc., power configuration, displacement configuration, safety configuration, etc.

In a second embodiment of the present disclosure, a training step of a trained second random forest model is shown in fig. 5, and fig. 5 is a flowchart of another model training method provided in the second embodiment of the present disclosure.

Step S501: determining and collecting required initial user data aiming at the trained second random forest model; the initial user data is user data associated with the target vehicle.

Step S502: carrying out data preprocessing on the initial user data to obtain fourth user data;

step S503: repeatedly and randomly extracting samples in a put-back manner in the fourth user data respectively by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the fourth user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

step S504: and optimizing random forest parameters of the initial random forest model by adopting the longicorn whiskers based on the training set and the test set so as to obtain a trained second random forest model with optimal random forest parameters.

It should be noted that, in the second embodiment of the present disclosure, the third user data and the fourth user data are obtained in the same manner, and both the first user data is determined, and then data preprocessing is performed on the first user data, and the following description specifically takes the manner of obtaining the fourth user data as an example.

In a second embodiment of the present disclosure, the initial user data includes at least one of the following data: vehicle end data reported by a target vehicle; the target vehicle corresponds to application operation data of a user corresponding to the target vehicle aiming at the target vehicle; vehicle configuration data relating to the target vehicle; geographic information system data associated with the target vehicle.

The vehicle-side reported data includes, but is not limited to, vehicle position, alarm, rapid acceleration, braking and other driving data reported by the vehicle, fuel consumption data, vehicle fault data and the like

The vehicle configuration data includes, but is not limited to, attribute configuration such as vehicle type, color, price, etc., power configuration, displacement configuration, safety configuration, etc.

In the second embodiment of the present disclosure, in order to obtain the fourth user data based on the initial user data, data preprocessing is required on the initial user data, and the preprocessing includes, but is not limited to, performing at least data cleaning, data classification, feature extraction, and data normalization processing on the initial user data.

The data classification includes dividing initial user data into preset data types according to the usage, attribute, etc. of the data, for example: the initial user data is divided into trajectory data, habit data, vehicle configuration data, and the like. The track data is used for determining high-frequency place data of a user corresponding to the target vehicle and social radius data of the user corresponding to the target vehicle; the habit data is used for determining configuration preference data of a user corresponding to the target vehicle and use preference data of the user corresponding to the target vehicle; the vehicle configuration data is used to determine vehicle configuration data associated with the target vehicle.

For the initial user data after data classification, further feature extraction needs to be performed to extract data that can have identification. Specifically, the track data is subjected to feature extraction, so that high-frequency place data capable of identifying a high-frequency place of a user corresponding to the target vehicle, social radius data identifying a social radius of the user corresponding to the target vehicle, and the like can be obtained; the habit data is subjected to feature extraction, so that configuration preference data capable of identifying configuration preference of a user corresponding to the target vehicle, use preference data identifying use preference of the user corresponding to the target vehicle and the like can be obtained; by performing feature extraction on the vehicle configuration data, vehicle configuration data and the like capable of identifying the vehicle configuration of the user corresponding to the target vehicle can be obtained.

Taking the feature extraction of the trajectory data to obtain the high-frequency location data of the user corresponding to the target vehicle, the social radius data of the user corresponding to the target vehicle, and the like as an example, the process of performing the feature extraction on the driving data and the trajectory data may include, but is not limited to: buffer area analysis is combined with superposition analysis, hotspot analysis, geographical weighted regression and geographical clustering analysis.

In the second embodiment of the present disclosure, the implementation manner of determining and collecting the required initial user data for the trained second random forest model is generally as follows:

firstly, determining the dimension of the user requirement determined by the trained second random forest model, and determining the type of the user requirement under the dimension.

Then, based on the user demand type, the initial user data needing to be collected is searched in a preset initial user data collection table, and different types of initial user data needing to be collected are loaded in the initial user data collection table. For example, if the dimension is one and the user requirement category is vehicle color, the initial user data to be collected may be only vehicle configuration data related to the target vehicle; for example, if the dimension is one and the category is an operation requirement, the initial user data to be collected may be only application operation data that corresponds to the target vehicle and is operated by the user on the target vehicle; the following steps are repeated: the dimension is two, and the categories are vehicle type requirements and configuration requirements, then the initial user data to be collected is at least application operation data of the target vehicle corresponding to the user to operate on the target vehicle, and vehicle configuration data related to the target vehicle.

It should be noted that, in the second embodiment of the present disclosure, the detailed implementation processes of steps S503 and S54 may be referred to the detailed implementation processes of steps S203 and S204 in the first embodiment of the present disclosure. The first user data, the second user data and the user representation are replaced with the third user data, the fourth user data and the user requirement information, and the description is omitted.

In addition, after determining the demand information of the target user for the target vehicle, the user demand prediction method provided in the second embodiment of the present disclosure may further use the demand information of the target user for the target vehicle as reference information for planning and designing the vehicle. Therefore, the planned vehicle can be guaranteed to better meet the requirements of users, and the sales volume of the vehicle can be increased.

Third embodiment

A method for determining a setting position of a charging pile is further provided in the second embodiment of the present disclosure, and specifically refer to fig. 6, which is a flowchart of the method for determining a setting position of a charging pile provided in the second embodiment of the present disclosure. The method may comprise the steps of:

step S601: fifth user data associated with the target vehicle is obtained.

Step S602: inputting fifth user data into a trained third random forest model, and determining the setting position information of the charging pile, wherein the trained third random forest model is a model obtained by training in a mode of optimizing random forest parameters of an initial random forest model by using sixth user data related to a target vehicle through longicorn whiskers; and the trained third random forest model has globally optimal random forest parameters.

According to the method for determining the setting position of the charging pile provided by the third embodiment of the disclosure, the setting position information of the charging pile can be determined by inputting the fifth user data related to the target vehicle into the trained third random forest model. Thereby make the definite simpler that fills the position of setting up of electric pile, need not to spend a large amount of human costs in a large number. And relative to the mode of obtaining relevant user data through paper or an electronic questionnaire and determining the setting position of the charging pile, the setting position of the charging pile is determined by using a third random forest model with global optimal random forest parameters, and the accuracy of the determined setting position of the charging pile can be improved.

In addition, the random forest model has the advantages of high training speed, high prediction speed and the like in the classification model, so that the random forest model is adopted when the setting position of the charging pile is determined. However, in order to improve the classification accuracy and efficiency of the random forest model, the random forest parameters in the random forest model must be appropriate, the model is trained in a manner that the longicorn must optimize the random forest parameters of the initial random forest model, the characteristics of the initial random forest model can be selected, and the random forest parameters can be optimized. Therefore, the problems of characteristic redundancy and large calculated amount of the model are solved, and the classification precision and efficiency of the random forest model are improved.

In a third embodiment of the present disclosure, the fifth user data and the sixth user data may each include at least one of high-frequency location data of a user corresponding to the target vehicle, social radius data of a user corresponding to the target vehicle, energy consumption data corresponding to the target vehicle, and vehicle configuration data related to the target vehicle.

The fifth user data and the sixth user data are both user data related to the target vehicle, and the user refers to a user corresponding to the target vehicle and generally refers to a vehicle owner. And the data type of the fifth user data and the sixth user data generally needs to be kept consistent. The target vehicle may include different vehicles of a pre-selected vehicle type, or may be different vehicles of different vehicle types. The target vehicle is an electrically driven vehicle or a somewhat hybrid driven vehicle.

The charging pile is arranged in a position influenced by the activity range of a user and the energy consumption of the vehicle, the activity range of the user can be determined according to the inertial walking route data of the user corresponding to the target vehicle, the high-frequency place data of the user corresponding to the target vehicle and the social radius data of the user corresponding to the target vehicle, and the energy consumption of the vehicle can be determined according to the energy consumption data, the power configuration, the displacement configuration and the like of the vehicle.

The target user demand information for the target vehicle includes at least one of the following information: so-called target user color demand information for the vehicle; price demand information of the target user for the vehicle; operation demand information of the target user for the vehicle; the vehicle type demand information of the target user for the vehicle; and the power configuration demand information of the target user on the vehicle.

In a third embodiment of the present disclosure, a training step of a trained third random forest model is shown in fig. 7, and fig. 7 is a flowchart of another model training method provided in the third embodiment of the present disclosure.

Step S701: determining and collecting required initial user data aiming at the trained third random forest model; the initial user data is user data associated with the target vehicle.

Step S702: carrying out data preprocessing on the initial user data to obtain sixth user data;

step S703: repeatedly and randomly extracting samples in a put-back manner in the sixth user data respectively by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the sixth user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

step S704: and optimizing the random forest parameters of the initial random forest model by adopting the longicorn whiskers based on the training set and the test set so as to obtain a trained third random forest model with optimal random forest parameters.

It should be noted that, in the third embodiment of the present disclosure, the fifth user data and the sixth user data are obtained in the same manner, and both the initial user data is determined first, and then the data preprocessing is performed on the initial user data, and the following specifically takes the sixth user data obtaining manner as an example for detailed description.

In a third embodiment of the present disclosure, the initial user data includes at least one of the following data: the target vehicle corresponds to application operation data of a user aiming at the target vehicle; vehicle end data reported by a target vehicle; vehicle configuration data relating to the target vehicle; a number of geographic information systems associated with the target vehicle.

The vehicle configuration data includes, but is not limited to, vehicle model configuration, power configuration, displacement configuration, safety configuration, and the like.

In the third embodiment of the present disclosure, in order to obtain the sixth user data based on the initial user data, data preprocessing is required to be performed on the initial user data, and the preprocessing includes, but is not limited to, performing at least data cleaning, data classification, feature extraction, and data normalization processing on the initial user data.

The data classification includes dividing initial user data into preset data types according to the usage, attribute, etc. of the data, for example: the initial user data is divided into trajectory data, energy consumption data and the like. The track data is used for determining high-frequency place data of a user corresponding to the target vehicle and social radius data of the user corresponding to the target vehicle; the energy consumption data is used to determine energy consumption data corresponding to the target vehicle and vehicle configuration data associated with the target vehicle.

For the initial user data after data classification, further feature extraction needs to be performed to extract data that can have identification. Specifically, the track data is subjected to feature extraction, so that high-frequency place data capable of identifying a high-frequency place of a user corresponding to the target vehicle, social radius data identifying a social radius of the user corresponding to the target vehicle, and the like can be obtained; the habit data is subjected to feature extraction, so that configuration preference data capable of identifying configuration preference of a user corresponding to the target vehicle, use preference data identifying use preference of the user corresponding to the target vehicle and the like can be obtained; the vehicle configuration data and the energy consumption data are subjected to feature extraction, and vehicle configuration data and the like capable of identifying the vehicle configuration of a user corresponding to the target vehicle can be obtained.

In a third embodiment of the present disclosure, an implementation manner of determining and acquiring required initial user data for a trained third random forest model is generally as follows: and determining initial user data needing to be acquired by the trained third random forest model.

It should be noted that, in the second embodiment of the present disclosure, the detailed implementation process of steps S1103 and 1104 may be referred to the detailed implementation process of steps S203 and S204 in the first embodiment of the present disclosure. The implementation principle is the same, and only the first user data, the second user data and the user portrait need to be replaced by the fifth user data, the sixth user data and the charging pile setting position information respectively, which is not described in detail herein.

Specifically, the charging pile setting position information includes but is not limited to the information of how many charging piles need to be set in a certain area for identifying, the interval information of the position where each charging pile is set, and the information of which positions in a certain area need to be set for identifying the charging piles.

Fourth embodiment

As shown in fig. 8, a fourth embodiment of the present disclosure provides a data processing system including:

a target vehicle 801, a roadside sensing device 802, and a server 803;

the target vehicle 801 is configured to acquire, through a vehicle-mounted sensor, vehicle-end data reported by the target vehicle 801 and application operation data of the target vehicle 801 corresponding to a user operating on the target vehicle 801; vehicle end data reported by the target vehicle 801 and application operation data operated by a user corresponding to the target vehicle 801 on the target vehicle 801 are uploaded to a server 803;

the roadside sensing device 802 is configured to collect roadside sensing data related to the target vehicle 801; uploading roadside awareness data related to the target vehicle 801 to a server 803;

the server 803 is configured to obtain vehicle-end data uploaded by the target vehicle 801 and reported by the target vehicle 801, and application operation data operated by a user corresponding to the target vehicle 801 on the target vehicle 801; acquiring roadside sensing data which is uploaded by the roadside sensing equipment 802 and related to the target vehicle 801; acquiring user attribute data of a user corresponding to the target vehicle 801, vehicle configuration data related to the target vehicle 801 and geographic information system data related to the target vehicle 801; the method provided in the embodiments of the present disclosure is implemented according to the acquired data.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

Fifth embodiment

A fifth embodiment of the present disclosure also provides a computer program product, and the computer program/instructions when executed by one processor alone or when executed by multiple processors in cooperation implement any one of the methods provided by the embodiments of the present disclosure.

It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an Advanced reduced instruction set machine (ARM) architecture.

The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other physical classes of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage media, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable Media does not include non-Transitory computer readable Media (transient Media), such as modulated data signals and carrier waves.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A user representation determination method, comprising:

obtaining first user data related to a target vehicle;

2. The method of claim 1, wherein the training step of the trained first random forest model comprises:

determining and collecting required initial user data aiming at the trained first random forest model; the initial user data is user data related to the target vehicle;

performing data preprocessing on the initial user data to obtain second user data;

repeatedly and repeatedly extracting samples in a put-back manner respectively in the second user data by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the second user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

and optimizing random forest parameters of the initial random forest model by adopting longicorn whiskers based on the training set and the testing set so as to obtain the trained first random forest model with the optimal random forest parameters.

3. The method of claim 2, wherein the data pre-processing the initial user data comprises: at least data cleansing, data classification, feature extraction, and data normalization processing are performed on the initial user data.

4. The method of claim 2, wherein the initial user data comprises at least one of:

roadside awareness data related to the target vehicle;

vehicle end data reported by the target vehicle;

the target vehicle corresponds to application operation data operated by a user on the target vehicle;

user attribute data of a user corresponding to the target vehicle;

vehicle configuration data relating to the target vehicle;

geographic information system data associated with the target vehicle.

5. The method of any of claims 1-3, wherein the second user data comprises at least one of driving type data of a user corresponding to the target vehicle, offline preference data of the user corresponding to the target vehicle, route data of a user corresponding to the target vehicle, high-frequency location data of the user corresponding to the target vehicle, social radius data of the user corresponding to the target vehicle, configuration preference data of the user corresponding to the target vehicle, usage preference data of the user corresponding to the target vehicle, user group data of the user corresponding to the target vehicle, vehicle configuration data related to the target vehicle, and user attribute data of the user corresponding to the target vehicle.

6. The method of claim 1, further comprising: and determining a potential user corresponding to the target vehicle based on the user representation of the user corresponding to the target vehicle.

7. A method for predicting user demand, comprising:

obtaining third user data related to the target vehicle;

8. The method of claim 7, wherein the step of training the trained second random forest model comprises:

determining and collecting required initial user data aiming at the trained second random forest model, wherein the initial user data is user data related to the target vehicle;

performing data preprocessing on the initial user data to obtain fourth user data;

repeatedly and repeatedly extracting samples in a put-back manner respectively in the fourth user data by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the fourth user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

and optimizing random forest parameters of the initial random forest model by adopting longicorn whiskers based on the training set and the testing set so as to obtain the trained second random forest model with the optimal random forest parameters.

9. The method of claim 8, wherein the data pre-processing the initial user data comprises: at least data cleansing, data classification, feature extraction, and data normalization processing are performed on the initial user data.

10. The method of claim 8, wherein the initial user data comprises at least one of:

vehicle end data reported by the target vehicle;

the target vehicle corresponds to the user and the application operation data of the target vehicle corresponding to the user aiming at the target vehicle;

vehicle configuration data relating to the target vehicle;

geographic information system data associated with the target vehicle.

11. The method of claim 8, wherein the fourth user data comprises at least one of social radius data of the target vehicle corresponding user, configuration preference data of the target vehicle corresponding user, usage preference data of the target vehicle corresponding user, and target vehicle related vehicle configuration data.

12. The method according to any one of claims 7 to 10, wherein the demand information of the target user for the target vehicle includes at least one of:

color demand information of the target user for the vehicle;

price demand information of the target user for the vehicle;

the operation demand information of the target user for the vehicle;

the target user needs the vehicle type demand information of the vehicle;

the power configuration demand information of the target user for the vehicle;

and the target user is used for configuring the displacement demand information of the vehicle.

13. The method of claim 7, further comprising: and taking the demand information of the target user for the target vehicle as reference information for planning and designing the vehicle.

14. A method for determining a setting position of a charging pile is characterized by comprising the following steps:

acquiring fifth user data related to the target vehicle;

15. The method of claim 13, wherein the step of training the trained third random forest model comprises:

determining and collecting required initial user data aiming at the trained third random forest model; the initial user data is user data related to the target vehicle;

performing data preprocessing on the initial user data to obtain sixth user data;

repeatedly and repeatedly extracting samples in a put-back manner respectively in the sixth user data by adopting a self-service method to determine a training set of each decision tree, and taking other samples in the sixth user data as a test set of each decision tree, wherein each decision tree is a different decision tree in the initial random forest model;

and optimizing random forest parameters of the initial random forest model by adopting longicorn whiskers based on the training set and the testing set so as to obtain the trained third random forest model with the optimal random forest parameters.

16. The method of claim 14, wherein the data pre-processing the initial user data comprises: at least data cleansing, data classification, feature extraction, and data normalization processing are performed on the initial user data.

17. The method of claim 16, wherein the initial user data comprises at least one of:

the target vehicle corresponds to application operation data of a user aiming at the target vehicle;

vehicle end data reported by the target vehicle;

vehicle configuration data relating to the target vehicle;

geographic information system data associated with the target vehicle.

18. The method of any of claims 14 to 16, wherein the sixth user data comprises at least one of high frequency location data of a user corresponding to the target vehicle, social radius data of a user corresponding to the target vehicle, energy consumption data corresponding to the target vehicle, and vehicle configuration data related to the target vehicle.

19. A data processing system, the system comprising: the system comprises a target vehicle, a roadside sensing device and a server;

the server is used for acquiring vehicle end data uploaded by the target vehicle and reported by the target vehicle and application operation data operated by a user corresponding to the target vehicle on the target vehicle; acquiring roadside sensing data which is uploaded by the roadside sensing equipment and related to the target vehicle; acquiring user attribute data of a user corresponding to the target vehicle, vehicle configuration data related to the target vehicle and geographic information system data related to the target vehicle; implementing the method of any one of claims 1 to 18 on the basis of the acquired data.

20. A computer program product comprising computer programs/instructions for implementing the method of any one of claims 1 to 18 when executed by a processor alone or in combination with a plurality of processors.