WO2016206196A1 - Method and device for obtaining user attribute information, and server - Google Patents

Method and device for obtaining user attribute information, and server Download PDF

Info

Publication number
WO2016206196A1
WO2016206196A1 PCT/CN2015/089823 CN2015089823W WO2016206196A1 WO 2016206196 A1 WO2016206196 A1 WO 2016206196A1 CN 2015089823 W CN2015089823 W CN 2015089823W WO 2016206196 A1 WO2016206196 A1 WO 2016206196A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
log information
data
location
information
Prior art date
Application number
PCT/CN2015/089823
Other languages
French (fr)
Chinese (zh)
Inventor
吴海山
汪天一
武政伟
李正学
张潼
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2016206196A1 publication Critical patent/WO2016206196A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to the field of terminal technologies, and in particular, to a method, an apparatus, and a server for acquiring user attribute information.
  • the user portrait can be a collection of user attribute information, and a model can be used to describe the characteristics of the user.
  • the main method of user portrait construction is to analyze user attribute information based on the user's online search behavior.
  • the user's online search behavior may have noise caused by virtual user search behavior such as search information forged by a malicious user, the result of constructing the user portrait is inaccurate.
  • the image construction based on the user's online search behavior may also have the problem of text semantic divergence.
  • the same search term may point to different user characteristics. For example, the user searches for “ ⁇ ”, may be concerned about tourism information, or may like to like Lushan Related movies.
  • the application provides a method, device and server for obtaining user attribute information.
  • the present application provides a method for acquiring user attribute information, including: acquiring map log information, locating log information, and search engine log information; and mapping log information, positioning log information, and search engine log information. Performing pre-processing to obtain relevant data of the user; acquiring behavior characteristics of the user based on the relevant data of the user; and determining user attribute information based on the behavior characteristics of the user.
  • preprocessing the map log information, the location log information, and the search engine log information including: analyzing the log log information, the location log information, and the data included in the search engine log information; extracting the map
  • the data related to the geographical location and user behavior in the log information, the location log information, and the search engine log information are used as related data of the user.
  • the map log information, the location log information, and the search engine log information are preprocessed, and the method further includes: searching, by the network, the data included in the map log information, the location log information, and the search engine log information.
  • the information is used as relevant data for the user.
  • the user's related data includes at least location retrieval data and/or positioning data.
  • the location search data includes at least one of the following: target location search data, route search data, and corresponding route information; and peripheral data of the target location.
  • the target location search data includes at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user;
  • the route search data includes at least one of the following: a time when the user retrieves the route, and a starting geographic location.
  • the surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
  • the user's behavior characteristics are obtained based on the user's related data, including at least one of: statistic and analysis of the distribution of the geographic location where the user stays based on the positioning data to determine the location of the user's fixed activity; Retrieving data to obtain the user's point of interest information; performing statistics and analysis on the user's travel mode based on the location search data to determine the user's preferred travel mode; calculating the relevance between the users based on the location data to determine the intimacy of multiple users .
  • determining user attribute information based on a user's behavior characteristics includes: determining user attribute information using the trained model based on the user's behavior characteristics.
  • the user attribute information includes at least one of the following: a user's age group, gender, occupation, interest, income level, spending habits, health status, social relationships, and fixed asset status.
  • the present application provides an apparatus for acquiring user attribute information, including: a first acquiring unit, configured to acquire map log information, positioning log information, and log information of a search engine; and a preprocessing unit, configured to The map log information, the location log information, and the search engine log information are preprocessed to obtain related data of the user; the second obtaining unit is configured to acquire the user's behavior characteristics based on the related data of the user; and the determining unit is configured to be based on the user The behavioral characteristics determine user attribute information.
  • the pre-processing unit is configured to pre-process the map log information, the location log information, and the search engine log information as follows: the map log information, the location log information, and the search engine log information are included. Data is analyzed; data related to geographic location and user behavior in the log information of the map, the location log information, and the log information of the search engine are extracted as relevant data of the user.
  • the pre-processing unit is further configured to pre-process the map log information, the location log information, and the log information of the search engine by searching for the information of the map log, the location log information, and the search engine through the network.
  • the log information contains data related information as the user's related data.
  • the user's related data includes at least location retrieval data and/or positioning data.
  • the location search data includes at least one of the following: target location search data, route search data, and corresponding route information; and peripheral data of the target location.
  • the target location search data includes at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user;
  • the route search data includes at least one of the following: a time when the user retrieves the route, and a starting geographic location.
  • the surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
  • the second obtaining unit is configured to acquire a behavior characteristic of the user according to at least one of the following: the distribution of the geographic location where the user stays based on the positioning data Statistics and analysis to determine the location of the user's fixed activities; based on the location retrieval data to obtain the user's point of interest information; based on the location retrieval data to the user's travel mode statistics and analysis to determine the user's preferred travel mode; based on positioning data calculation Correlation between users to determine the intimacy of multiple users.
  • the determining unit determines the user attribute information using the trained model based on the behavior characteristics of the user.
  • the user attribute information includes at least one of the following: a user's age group, gender, occupation, interest, income level, spending habits, health status, social relationships, and fixed asset status.
  • the present application provides a server, including the apparatus for acquiring user attribute information provided by the second aspect of the present application.
  • the method, device and server for obtaining user attribute information obtained by the application obtain the map log information, the location log information and the log information of the search engine, and then pre-stage the map log information, the location log information and the search engine log information. Processing to obtain relevant data of the user, and then acquiring the behavior characteristics of the user based on the relevant data of the user, and finally determining the attribute information of the user based on the behavior characteristics of the user, fully utilizing the information of the user, the location of the map, and the like to analyze the attribute information of the user, and improve the user attribute information. The comprehensiveness and accuracy of the obtained user attribute information.
  • FIG. 1 is a flowchart of an embodiment of a method for acquiring user attribute information provided by an embodiment of the present application
  • FIG. 2 is a flowchart of an embodiment of a method for pre-processing map log information, location log information, and search engine log information provided by an embodiment of the present application;
  • FIG. 3 is a schematic structural diagram of an embodiment of an apparatus for acquiring user attribute information provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of an exemplary system architecture to which an embodiment of the present application may be applied;
  • Figure 5 is a diagram showing the structure of a computer system suitable for implementing the server of the embodiment of the present application schematic diagram.
  • the terminals involved in the present application may include, but are not limited to, a smartphone, a tablet, a personal digital assistant, a smart wearable device, a laptop portable computer, and the like.
  • a smartphone a tablet
  • a personal digital assistant a smart wearable device
  • a laptop portable computer a laptop
  • an exemplary embodiment of the present application is described in connection with an electronic map, a browser, and a terminal having a positioning function.
  • FIG. 1 illustrates a flow of one embodiment of a method for obtaining user attribute information according to the present application. This method can be performed by the server.
  • step 101 map log information, location log information, and log information of a search engine are acquired.
  • the terminal can establish a communication connection with the map server to obtain the map data that the user queries; when the user accesses the webpage or searches through the browser of the terminal, the terminal can Establishing a communication connection with the web server to obtain the webpage information that the user wants to access; when the user opens the positioning function of the terminal, the terminal can establish a communication connection with the positioning server to obtain the current positioning data.
  • the terminal may save the information related to the query, the access, and the location to the memory of the terminal, or upload the information related to the query, the access, and the location to the corresponding server. Generate log information and save it.
  • related information such as query time and query content
  • related information such as the three-dimensional geographic location data of the terminal and the positioning time
  • Log information information about the user's search through the search engine on the browser (such as checking The content of the content, the time of the retrieval) can be saved as the log information of the search engine.
  • the server may acquire map log information, location log information, and log information of the search engine from the terminal through the network. If the log information is saved in the corresponding server, the server may obtain the map log information from the map server, obtain the location log information from the location server, and obtain the log information of the search engine from the server of the search engine. In some implementations, the server may obtain log information from the terminal and the corresponding map server, the location server, and the server of the search engine, and identify and extract the log of the same user based on the user's identification information (eg, IP address, user ID, etc.). information.
  • the user's identification information eg, IP address, user ID, etc.
  • the map log information, the location log information, and the search engine log information may include: time information, operation information, status information, network protocol information, query, location, or search result, and storage space occupied by content included in the result.
  • the above log information is saved as fields.
  • step 102 map log information, location log information, and search engine log information are preprocessed.
  • the server may preprocess the acquired map log information, the location log information, and the log information of the search engine to obtain related data of the user.
  • the user's related data may be data related to the user's behavior, data related to the user's attribute characteristics (such as gender, personality), and data related to the user's geographic location.
  • the log information includes time information, operation information, query, positioning, or search results, and includes status information irrelevant to the user's behavior, attribute characteristics, and geographic location, and network protocol information.
  • the server may filter information related to user behavior in the log information. For example, when the log information is in the form of a field, the field indicating the status information and the network protocol may be filtered out by analyzing the content of the field.
  • FIG. 2 it is a flowchart of an embodiment of a method for pre-processing map log information, location log information, and search engine log information provided by an embodiment of the present application.
  • step 201 data included in the map log information, the location log information, and the log information of the search engine are analyzed.
  • the server may analyze the acquired map log information, the location log information, and the log information of the search engine.
  • the log information in the form of a field may be segmented, and the content represented by each partial field may be analyzed.
  • a log information of a search engine may include a user's IP address, search time, search content, status code, visited URL, and the like. This log information can be analyzed to determine the user's search time and search content.
  • step 202 data related to the geographical location and user behavior in the map log information, the location log information, and the log information of the search engine are extracted as related data of the user.
  • the server may separately determine whether the data obtained in the analysis in step 201 is related to the user behavior and related to the geographic location.
  • the related data can be extracted as relevant data of the user.
  • the user's relevant data can be used to analyze the user's attribute characteristics and construct a user portrait.
  • the map query time and the query content may be extracted from the map log information as related data of the user; the positioning time and the located geographical location data may be extracted from the positioning log information as related data of the user;
  • the keyword retrieved by the user, the search time, and the URL of the clicked search result are extracted from the log information of the search engine as the relevant data of the user.
  • the user's related data includes at least location retrieval data and/or location data.
  • the location retrieval data can be extracted from the map log information and the log information of the search engine, and the location data can be obtained from the location log information.
  • the location retrieval data may include at least one of the following: target location search data, route search data and corresponding route information, and peripheral data of the target location.
  • the target location search data may include at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user.
  • the map server can find the matching destination geographic location according to the user's search request, and can also obtain the current geographic location of the user, record the search time, and save the information to the map log information. .
  • the server can obtain this data from the map log information as user-related data.
  • the route search data may include at least one of the following: a time when the user retrieves the route, Start location, target location, trajectory data, and corresponding travel mode.
  • the trajectory data may be a route trajectory queried by the map server, and the corresponding travel mode may be driving, public transportation or walking.
  • the route search data can also include the distance between the starting geographic location and the target geographic location.
  • the corresponding line information of the route search data may be navigation data.
  • the navigation data may include detailed positioning information and a moving distance at a corresponding time.
  • the surrounding data of the target location may include at least one of the following: building data around the target location, traffic site data, and parking lot data.
  • the server can obtain data of a hotel, a restaurant, a subway station near the tourist attraction, the number and distance of the bus station, or the number of parking lots and the number of parking spaces around the tourist attraction.
  • step 203 information related to the map log information, the location log information, and the data included in the log information of the search engine is found through the network as related data of the user.
  • the server may also query the information related to the map log information, the location log information, and the data contained in the log information of the search engine as the relevant data of the user through the network. .
  • the server can find the type, attribute, and the like of the destination. For example, when the destination is a restaurant, the average consumption, cuisine, parking space, evaluation, other branch address, etc. of the restaurant can be found through the network as relevant data of the user; when the destination is a shopping place, the sale can be found.
  • the product brand, type, price and other information as the user's relevant data.
  • the server may analyze the log information, extract data related to the user behavior or the geographical location, and query the information related to the data included in the log information acquired in step 101, thereby acquiring related data of the user.
  • step 103 the user's behavioral characteristics are obtained based on the user's relevant data.
  • the behavior characteristics of the user can be obtained by analyzing the relevant data of the user obtained in step 102.
  • the server can infer the behavior characteristics of the user by data statistics.
  • the user's behavioral characteristics may include, but are not limited to, the location of the user's fixed activity, the point of interest information, the preferred mode of travel, and the social relationship.
  • the user's related data includes bits
  • the above behavior characteristics of the user can be separately obtained in various ways.
  • the server can perform statistics and analysis on the distribution of geographic locations that the user stays based on the location data to determine the location of the user's fixed activity. Specifically, the server can count the time and frequency of the stay, and obtain the geographic location with the longest stay time or the highest frequency of staying according to the statistical result, and infer the user's family location, work place and other rules in combination with the specific time information of the stay.
  • the location of sexual activity For example, a geographical location in which all the staying time is the longest and the staying time is a non-working time can be regarded as a family place; a geographical position where the staying time is the longest and the staying time is the working time is taken as the working place of the user.
  • a geographical location where the user stays high and is stable as a place for regular activity of the user (for example, a gym, a restaurant that the user likes, etc.).
  • the location of the user's fixed activity can be used to analyze the user's occupation, interest, housing price and other information.
  • the server can retrieve the user's point of interest information based on the location retrieval data.
  • the user's point of interest information may be information that the user may be interested in.
  • the server may acquire attribute information such as a destination type retrieved by the user, and acquire related information as the interest point information of the user according to the attribute information such as the destination type.
  • the server may also acquire frequency information of a destination of the user retrieving the same attribute information as the user's interest point information. For example, when the location search data includes the location information of the user to retrieve an airport, the server may obtain the tourist attraction information, the hotel information, and the like of the location of the airport as the user's point of interest information to analyze the attribute information such as the user's interest and personality.
  • the server may obtain information such as the star rating and consumption level of the hotel as the user's point of interest information to analyze the user's asset status.
  • the location search data may also contain information related to a gas station, a parking lot, and a car wash location, and the server may obtain such information to analyze whether the user has asset information such as a car.
  • the server may perform statistics and analysis on the way the user travels based on the location retrieval data to determine the manner in which the user prefers travel.
  • the location search data may include route information selected by the user when the route is retrieved and the corresponding travel mode, and the server may perform statistics on the travel mode selected by the user, and analyze the travel mode commonly used by the user (driving, Buses, subways, buses, trains, airplanes, etc., to infer the user's sensitivity to traffic costs, and then analyze the user's asset status.
  • the server can calculate the degree of correlation between users based on the positioning data to determine the degree of intimacy of the plurality of users. For example, the server may analyze the strength of the social attributes between the users according to the cross-characteristics of the positioning data of the user staying in the adjacent geographical location, that is, the intimacy of the multiple users. The server may further analyze the type of social relationship of the user based on time information that the user stays in the adjacent geographic location, such as family, friends, work partners, and the like. For example, when a plurality of users stay in the same geographical location for a non-working time for more than one threshold, it may be considered that the plurality of users have a higher degree of intimacy, which may be a family or friend relationship.
  • the server may also obtain type information of the mobile terminal used by the user, such as an Android system or an ios system, to infer the user's asset status.
  • type information of the mobile terminal used by the user such as an Android system or an ios system
  • the related data of the user obtained in step 102 can be used to analyze and acquire various other types of user behavior characteristics, for example, the location type of the user frequently staying based on the user's positioning data and location retrieval data can be analyzed;
  • the location search data is used to count the user's travel rules.
  • step 104 user attribute information is determined based on the user's behavior characteristics.
  • the server may perform data mining on the behavior characteristics of the user, and analyze the attribute information of the user based on the statistical result of step 103.
  • the user attribute information may include at least one of the user's age group, gender, occupation, interest, income level, spending habits, health status, social relationship, and fixed asset status.
  • the plurality of candidate user attribute information and its probability may be inferred, for example, the age groups of the plurality of different users and the probability corresponding to each age group.
  • the candidate user attribute information with the highest probability is then taken as the user attribute information.
  • the age range of the user may be determined according to the location of the user's fixed activity. For example, when the location of the fixed activity of the user is a university, it may be inferred that the age of the user is 18-24 years old, and when the location of the fixed activity of the user is an office building, It can be inferred that the user's older age is 24-50 years old, and the user can infer the user when the fixed activity location is the elderly center. The older age is greater than 50 years old.
  • the user's age range can also be inferred based on the user's point of interest information. For example, when a user retrieves a playground through a map, it can be inferred that the probability of the user being 15-30 years old is large.
  • the gender of the user can be judged according to the information of the interest point. For example, the user who searches for an address such as a beauty salon has a higher probability of being a female, and the user who searches for an address such as a basketball court has a higher probability of being a male.
  • the user's occupation can be judged by the user's positioning data. For example, when the location of the user's fixed activity in the user's behavioral features includes the building of an IT enterprise, the probability that the user is engaged in the IT industry may be determined to be large; when the location of the user's fixed activity in the user's behavioral feature includes the government agency unit, then It can be determined that the probability that the user's occupation is a civil servant is large.
  • the user's interest can be determined by the point of interest information in the user's behavioral characteristics. For example, when the number of sports places included in the user's point of interest information exceeds a certain value, the user may be considered to like sports.
  • the user's social relationship can be judged by retrieving data from the user's location. For example, when the location search data of the user includes a target place such as a children's playground, a primary school, a juvenile palace, etc., it may be determined that the user has a child. Further, it is also possible to determine whether the user has a child by having the point of interest information of the user acquired by the location retrieval data. For example, when the user's point of interest information contains a large number of keywords like "child”, "young", and "infant", it can be determined that the user has a child.
  • the social relationship of the user can also be determined by the degree of intimacy between users obtained by the positioning data of a plurality of users. For example, when the degree of intimacy of a plurality of users is high, the server can determine that the probability that a plurality of users are family or friend relationships is high.
  • the user's income level, consumption level, and fixed asset status can be inferred from multiple behavioral characteristics of the user. For example, the user's income level and the fixed asset status may be inferred based on the user's sensitivity to the transportation cost, and the residential price may be inferred based on the user's home address to determine the user's fixed asset status; or the consumer location (eg, hotel) may be retrieved according to the user.
  • the consumer price level of the user, the restaurant, etc.) to determine the user's income level and consumption level; the user's income level and consumption level can also be determined according to the frequency at which the user searches for a high-consumption place (such as a golf course, etc.).
  • the health status of the user can be inferred based on the frequency of the user's location data in the medical field such as the hospital and the pharmacy. For example, when the frequency of the user staying in the hospital is 2 to 3 times per week, it can be inferred that the probability of the user's health condition being poor is large.
  • the method of determining several kinds of user attribute information is exemplarily described above. It should be noted that the user attribute information may not be limited to the several types of information described above, and the method provided by the present application is not limited to determining several user attribute information described above, and may also be used to determine other types of user attributes.
  • the information may be determined, for example, based on statistical data of the clothing store where the user frequently stays; the user's favorite wearing style may be analyzed; the user's personality (extroverted or introverted) or the like may be analyzed based on statistical data of the user's travel frequency.
  • the server may calculate the probability of each user attribute information according to a preset rule.
  • a frequency threshold may be preset. When the frequency of the user staying in a certain geographic location exceeds the frequency threshold, the probability that the user's home address is the geographic location is set to 80%.
  • the list of high-end consumer places may be preset. When the location data of the user includes a preset location in the high-consumption list, the probability that the user is a high-income level may be set to be greater than 50%.
  • the server may determine the user attribute information using the trained model based on the user's behavioral characteristics.
  • the server may train the classification model as a training set by using the location retrieval data of the user of the known user attribute and the user behavior feature acquired by the positioning data.
  • the classification model may include a plurality of sub-models, each of which is used to classify one attribute information of the user.
  • the classification model can also be a comprehensive model for classifying various attribute information of users.
  • the classification model may also be optimized based on a user behavior feature obtained by retrieving data and positioning data acquired by a user different from the training set. In the application, the trained and optimized model can be used to analyze the behavior characteristics of the user acquired in step 103, and the attribute information of the user is obtained.
  • the method for obtaining user attribute information provided by the foregoing embodiment is performed by pre-processing the acquired map log information, the location log information, and the log information of the search engine to obtain related data of the user, and then acquiring the related data based on the user.
  • the behavior characteristics of the user, and finally determine the user attribute information based on the user's behavior characteristics fully utilize the user's positioning, map search and other information to analyze the user attribute information, and improve the acquired user attributes. The comprehensiveness and accuracy of the information.
  • the method for obtaining user attribute information provided by the above embodiment of the present application can be used to construct a user portrait. Further, the content may be recommended to the user based on the user portrait.
  • the map server may analyze the user's eating preferences according to the user's portrait, and recommend the personalized food that is more in line with the user's taste when the user searches for the food. It is also possible to provide a basis for the location selection based on the user image. For example, the location server can determine the distribution, preference and the like of the target group based on the user image, and optimize the location result.
  • the apparatus 300 for acquiring user attribute information may include a first obtaining unit 301, a pre-processing unit 302, a second obtaining unit 303, and a determining unit 304.
  • the first obtaining unit 301 can be configured to acquire map log information, location log information, and log information of a search engine.
  • the pre-processing unit 302 can be configured to pre-process the map log information, the location log information, and the log information of the search engine to obtain related data of the user.
  • the second obtaining unit 303 can be configured to acquire a behavior characteristic of the user based on the related data of the user.
  • the determining unit 304 can be configured to determine user attribute information based on the behavior characteristics of the user.
  • the first obtaining unit 301 may acquire map log information, location log information, and log information of the search engine of the same user from the terminal or the map server, the location server, and the server of the search engine.
  • the pre-processing unit 302 may analyze the data included in the log information acquired by the first obtaining unit 301, and extract data related to the user behavior or the geographical location as related data of the user.
  • the pre-processing unit 302 can also find out, through the network, information related to map log information, location log information, and data included in the log information of the search engine as related data of the user.
  • the user's related data can include at least location retrieval data and/or positioning data.
  • the location search data may include at least one of the following: target location search data, route search data, and corresponding line information; and peripheral data of the target location.
  • the target location search data may include at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user.
  • the route search data may include at least one of the following: a time when the user retrieves the route, a starting geographic location, a target geographic location, and a track Trace data and the corresponding travel mode.
  • the surrounding data of the target location may include at least one of the following: building data around the target location, traffic site data, and parking lot data.
  • the second obtaining unit 303 may be configured to acquire a behavior characteristic of the user according to at least one of the following: perform statistics and analysis on the distribution of the geographic location where the user stays based on the positioning data obtained by the pre-processing unit 302, to determine that the user is fixed.
  • the location of the activity; the location retrieval data obtained by the preprocessing unit 302 is used to obtain the user's point of interest information; based on the location retrieval data obtained by the preprocessing unit 302, the user's travel mode is statistically analyzed and analyzed to determine the user's preferred travel mode.
  • the correlation between the users is calculated based on the positioning data obtained by the pre-processing unit 302 to determine the degree of intimacy of the plurality of users.
  • the determining unit 304 may determine the user attribute information using the trained model based on the behavior characteristics of the user acquired by the second obtaining unit 303.
  • the user attribute information may include at least one of the following: a user's age group, gender, occupation, interest, income level, consumption habits, health status, social relationship, and fixed asset status.
  • apparatus 300 for obtaining user attribute information correspond to the various steps in the method described in FIGS. 1-2.
  • the operations and features described above for the method are equally applicable to the apparatus 300 for acquiring user attribute information and the units included therein, and are not described herein again.
  • the system 400 can include terminals 401, 402, a map server 403, a location server 404, a search engine server 405, and a server 407 for implementing user attribute information provided by the present application.
  • the server 407 may include the apparatus 300 for acquiring user attribute information in the above embodiment.
  • Network 406 may also be included in the system architecture.
  • Network 406 is used to provide a medium for communication links between terminals 401, 402, servers 403, 404, 405, and 407.
  • Network 406 can include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the terminals 401, 402 can interact with the servers 403, 404, 405, 407 over the network 406 to receive or transmit messages and the like.
  • the terminal 401, 402 has a positioning function, and can install a map application and a browser.
  • the map log information can be sent to the map server 403 through the network 406, and the positioning log information is sent to the positioning server 404 through the network, and the log information of the search engine is transmitted through the network.
  • 406 is sent to the server 405 of the search engine.
  • Server 407 The log information can be obtained from the terminals 401, 402 and the servers 403, 404, 405 through the network 406, the log information is preprocessed, the user behavior characteristics are extracted, and the user attribute information is determined.
  • the server 407 can also send the determined user attribute information to the map server 403 and the server 405 of the search engine via the network 406.
  • the map server 403 may recommend relevant information to the user based on the user attribute information when the user searches for the target location using the map; the server 405 of the search engine may reorder the web pages based on the user attribute information to enable the user to quickly search for information satisfying the demand. .
  • the terminals 401, 402 can be various electronic devices including, but not limited to, personal computers, smart phones, smart watches, tablets, personal digital assistants, and the like.
  • the server 407 can perform processing of storing, analyzing, and the like on the received data, and feed back the processing result to the terminal and the servers 403, 404, and 405.
  • computer system 500 includes a central processing unit (CPU) 501 that can be loaded into a program in random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage portion 508. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM 503 various programs and data required for the operation of the system 500 are also stored.
  • the CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also coupled to bus 504.
  • the following components are connected to the I/O interface 505: an input portion 506; an output portion 507; a storage portion 508 including a hard disk or the like; and a communication portion 509 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 509 performs communication processing via a network such as the Internet.
  • Driver 510 is also coupled to I/O interface 505 as needed.
  • a removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
  • each block in the flowchart or block diagram can represent a module, program segment, or code.
  • the module, program segment, or portion of code includes one or more executable instructions for implementing the specified logical functions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be provided in the processor, for example, as a processor comprising a first acquisition unit, a pre-processing unit, a second acquisition unit and a determination unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the pre-processing unit may also be described as "a unit for pre-processing".
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the methods for obtaining user attribute information as described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and device for obtaining user attribute information, and server. The method comprises: obtaining map log information, positioning log information and search engine log information (101); pre-processing the map log information, the positioning log information and the search engine log information to obtain user-related data (102); obtaining, on the basis of the user-related data, user behavioral characteristics (103); and determining, on the basis of the user behavioral characteristics, user attribute information (104). Information such as user positioning and map search is fully utilized to analyze the user attribute information, thus improving comprehensiveness and accuracy of the obtained user attribute information.

Description

用于获取用户属性信息的方法、装置及服务器Method, device and server for obtaining user attribute information
相关申请的交叉引用Cross-reference to related applications
本申请要求百度在线网络技术(北京)有限公司于2015年6月26日提交的,发明名称为“用于获取用户、属性信息的方法、装置及服务器”的、中国专利申请号“201510363062.3”的优先权,其全部内容作为整体并入本申请中。This application claims Baidu Online Network Technology (Beijing) Co., Ltd. submitted on June 26, 2015, the invention name is "method for obtaining user, attribute information, device and server", Chinese patent application number "201510363062.3" Priority is hereby incorporated by reference in its entirety.
技术领域Technical field
本申请涉及计算机技术领域,具体涉及终端技术领域,尤其涉及用于获取用户属性信息的方法、装置及服务器。The present invention relates to the field of computer technologies, and in particular, to the field of terminal technologies, and in particular, to a method, an apparatus, and a server for acquiring user attribute information.
背景技术Background technique
用户画像可以是用户属性信息的集合,可以用一个模型来描述用户的特征。现有技术中,用户画像构建的主要方法是基于用户的线上搜索行为分析用户的属性信息。在这种方法中,由于用户的线上搜索行为中可能存在因恶意用户伪造的搜索信息等虚拟的用户搜索行为而造成的噪音,导致用户画像的构建结果不准确。此外,基于用户的线上搜索行为的画像构建还可能存在文本语义分歧的问题,同一搜索词可能指向不同的用户特征,例如用户搜索“庐山”,可能是关注旅游信息,也可能是喜欢与庐山相关的电影。The user portrait can be a collection of user attribute information, and a model can be used to describe the characteristics of the user. In the prior art, the main method of user portrait construction is to analyze user attribute information based on the user's online search behavior. In this method, since the user's online search behavior may have noise caused by virtual user search behavior such as search information forged by a malicious user, the result of constructing the user portrait is inaccurate. In addition, the image construction based on the user's online search behavior may also have the problem of text semantic divergence. The same search term may point to different user characteristics. For example, the user searches for “庐山”, may be concerned about tourism information, or may like to like Lushan Related movies.
另外,现有技术中还有基于用户实际交易数据的用户画像构建方法,基于用户的线上交易数据进行构建。用户的线上交易在用户的行为中为低频行为,因此无法据其得出全面、完整、准确的用户属性信息。In addition, in the prior art, there is also a user portrait construction method based on actual transaction data of the user, which is constructed based on the online transaction data of the user. The user's online transaction is a low-frequency behavior in the user's behavior, so it is not possible to derive comprehensive, complete and accurate user attribute information.
发明内容Summary of the invention
鉴于上述现有技术中的缺陷或不足,期望能够提供一种全面、准 确的用户属性信息的获取方法。本申请提供了用于获取用户属性信息的方法、装置及服务器。In view of the above-mentioned defects or deficiencies in the prior art, it is desirable to be able to provide a comprehensive and accurate The method of obtaining the user attribute information. The application provides a method, device and server for obtaining user attribute information.
第一方面,本申请提供了一种用于获取用户属性信息的方法,包括:获取地图日志信息、定位日志信息和搜索引擎的日志信息;对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据;基于用户的相关数据获取用户的行为特征;以及基于用户的行为特征确定用户属性信息。In a first aspect, the present application provides a method for acquiring user attribute information, including: acquiring map log information, locating log information, and search engine log information; and mapping log information, positioning log information, and search engine log information. Performing pre-processing to obtain relevant data of the user; acquiring behavior characteristics of the user based on the relevant data of the user; and determining user attribute information based on the behavior characteristics of the user.
在某些实现方式中,对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,包括:对地图日志信息、定位日志信息和搜索引擎的日志信息所包含的数据进行分析;提取地图日志信息、定位日志信息和搜索引擎的日志信息中与地理位置和用户行为相关的数据,作为用户的相关数据。In some implementation manners, preprocessing the map log information, the location log information, and the search engine log information, including: analyzing the log log information, the location log information, and the data included in the search engine log information; extracting the map The data related to the geographical location and user behavior in the log information, the location log information, and the search engine log information are used as related data of the user.
在进一步的实现方式中,对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,还包括:通过网络查找出与地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为用户的相关数据。In a further implementation manner, the map log information, the location log information, and the search engine log information are preprocessed, and the method further includes: searching, by the network, the data included in the map log information, the location log information, and the search engine log information. The information is used as relevant data for the user.
在某些实现方式中,用户的相关数据至少包括位置检索数据和/或定位数据。其中,位置检索数据包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息;以及目标位置的周边数据。In some implementations, the user's related data includes at least location retrieval data and/or positioning data. The location search data includes at least one of the following: target location search data, route search data, and corresponding route information; and peripheral data of the target location.
在进一步的实现方式中,目标位置搜索数据包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置;路线搜索数据包括以下至少一项:用户检索路线的时刻、起始地理位置、目标地理位置、轨迹数据以及对应的出行方式;目标位置的周边数据包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。In a further implementation manner, the target location search data includes at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user; the route search data includes at least one of the following: a time when the user retrieves the route, and a starting geographic location. The target geographical location, the trajectory data, and the corresponding travel mode; the surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
在某些实现方式中,基于用户的相关数据获取用户的行为特征,包括以下至少一项:基于定位数据对用户停留的地理位置的分布进行统计和分析,以确定用户固定活动的地点;基于位置检索数据获取用户的兴趣点信息;基于位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式;基于定位数据计算用户之间的相关度,以确定多个用户的亲密程度。 In some implementations, the user's behavior characteristics are obtained based on the user's related data, including at least one of: statistic and analysis of the distribution of the geographic location where the user stays based on the positioning data to determine the location of the user's fixed activity; Retrieving data to obtain the user's point of interest information; performing statistics and analysis on the user's travel mode based on the location search data to determine the user's preferred travel mode; calculating the relevance between the users based on the location data to determine the intimacy of multiple users .
在某些实现方式中,基于用户的行为特征确定用户属性信息,包括:基于用户的行为特征,采用已训练的模型确定用户属性信息。In some implementations, determining user attribute information based on a user's behavior characteristics includes: determining user attribute information using the trained model based on the user's behavior characteristics.
在某些实现方式中,用户属性信息包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。In some implementations, the user attribute information includes at least one of the following: a user's age group, gender, occupation, interest, income level, spending habits, health status, social relationships, and fixed asset status.
第二方面,本申请提供了一种用于获取用户属性信息的装置,包括:第一获取单元,用于获取地图日志信息、定位日志信息和搜索引擎的日志信息;预处理单元,用于对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据;第二获取单元,用于基于用户的相关数据获取用户的行为特征;以及确定单元,用于基于用户的行为特征确定用户属性信息。In a second aspect, the present application provides an apparatus for acquiring user attribute information, including: a first acquiring unit, configured to acquire map log information, positioning log information, and log information of a search engine; and a preprocessing unit, configured to The map log information, the location log information, and the search engine log information are preprocessed to obtain related data of the user; the second obtaining unit is configured to acquire the user's behavior characteristics based on the related data of the user; and the determining unit is configured to be based on the user The behavioral characteristics determine user attribute information.
在某些实现方式中,预处理单元用于按如下方式对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理:对地图日志信息、定位日志信息和搜索引擎的日志信息所包含的数据进行分析;提取地图日志信息、定位日志信息和搜索引擎的日志信息中与地理位置和用户行为相关的数据,作为用户的相关数据。In some implementations, the pre-processing unit is configured to pre-process the map log information, the location log information, and the search engine log information as follows: the map log information, the location log information, and the search engine log information are included. Data is analyzed; data related to geographic location and user behavior in the log information of the map, the location log information, and the log information of the search engine are extracted as relevant data of the user.
在进一步的实现方式中,预处理单元还用于按如下方式对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理:通过网络查找出与地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为用户的相关数据。In a further implementation manner, the pre-processing unit is further configured to pre-process the map log information, the location log information, and the log information of the search engine by searching for the information of the map log, the location log information, and the search engine through the network. The log information contains data related information as the user's related data.
在某些实现方式中,用户的相关数据至少包括位置检索数据和/或定位数据。其中,位置检索数据包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息;以及目标位置的周边数据。In some implementations, the user's related data includes at least location retrieval data and/or positioning data. The location search data includes at least one of the following: target location search data, route search data, and corresponding route information; and peripheral data of the target location.
在进一步的实现方式中,目标位置搜索数据包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置;路线搜索数据包括以下至少一项:用户检索路线的时刻、起始地理位置、目标地理位置、轨迹数据以及对应的出行方式;目标位置的周边数据包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。In a further implementation manner, the target location search data includes at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user; the route search data includes at least one of the following: a time when the user retrieves the route, and a starting geographic location. The target geographical location, the trajectory data, and the corresponding travel mode; the surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
在进一步的实现方式中,第二获取单元用于按如下至少一种方式获取用户的行为特征:基于定位数据对用户停留的地理位置的分布进 行统计和分析,以确定用户固定活动的地点;基于位置检索数据获取用户的兴趣点信息;基于位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式;基于定位数据计算用户之间的相关度,以确定多个用户的亲密程度。In a further implementation, the second obtaining unit is configured to acquire a behavior characteristic of the user according to at least one of the following: the distribution of the geographic location where the user stays based on the positioning data Statistics and analysis to determine the location of the user's fixed activities; based on the location retrieval data to obtain the user's point of interest information; based on the location retrieval data to the user's travel mode statistics and analysis to determine the user's preferred travel mode; based on positioning data calculation Correlation between users to determine the intimacy of multiple users.
在某些实现方式中,确定单元基于用户的行为特征,采用已训练的模型确定用户属性信息。In some implementations, the determining unit determines the user attribute information using the trained model based on the behavior characteristics of the user.
在某些实现方式中,用户属性信息包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。In some implementations, the user attribute information includes at least one of the following: a user's age group, gender, occupation, interest, income level, spending habits, health status, social relationships, and fixed asset status.
第三方面,本申请提供了一种服务器,包括本申请第二方面所提供的用于获取用户属性信息的装置。In a third aspect, the present application provides a server, including the apparatus for acquiring user attribute information provided by the second aspect of the present application.
本申请提供的用于获取用户属性信息的方法、装置及服务器,通过获取地图日志信息、定位日志信息和搜索引擎的日志信息,随后对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据,之后基于用户的相关数据获取用户的行为特征,最后基于用户的行为特征确定用户属性信息,充分利用了用户的定位、地图搜索等信息分析用户属性信息,提升了所获取的用户属性信息的全面性和准确性。The method, device and server for obtaining user attribute information provided by the application obtain the map log information, the location log information and the log information of the search engine, and then pre-stage the map log information, the location log information and the search engine log information. Processing to obtain relevant data of the user, and then acquiring the behavior characteristics of the user based on the relevant data of the user, and finally determining the attribute information of the user based on the behavior characteristics of the user, fully utilizing the information of the user, the location of the map, and the like to analyze the attribute information of the user, and improve the user attribute information. The comprehensiveness and accuracy of the obtained user attribute information.
附图说明DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.
图1是本申请实施例提供的用于获取用户属性信息的方法的一个实施例的流程图;1 is a flowchart of an embodiment of a method for acquiring user attribute information provided by an embodiment of the present application;
图2是本申请实施例提供的对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for pre-processing map log information, location log information, and search engine log information provided by an embodiment of the present application;
图3是本申请实施例提供的用于获取用户属性信息的装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of an apparatus for acquiring user attribute information provided by an embodiment of the present application;
图4是可以应用本申请实施例的示例性系统架构示意图;4 is a schematic diagram of an exemplary system architecture to which an embodiment of the present application may be applied;
图5是适于用来实现本申请实施例的服务器的计算机系统的结构 示意图。Figure 5 is a diagram showing the structure of a computer system suitable for implementing the server of the embodiment of the present application schematic diagram.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外,为了便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention, rather than the invention. In addition, for the convenience of description, only parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings.
本申请所涉及终端可以包括但不限于智能手机、平板电脑、个人数字助理、智能穿戴设备及膝上型便携计算机等等。出于示例描述目的以及为了简洁起见,在接下来的讨论中,结合安装有电子地图、浏览器、具有定位功能的终端来描述本申请的示例性实施例。The terminals involved in the present application may include, but are not limited to, a smartphone, a tablet, a personal digital assistant, a smart wearable device, a laptop portable computer, and the like. For purposes of example description and for the sake of brevity, in the following discussion, an exemplary embodiment of the present application is described in connection with an electronic map, a browser, and a terminal having a positioning function.
请参考图1,其示出了根据本申请的用于获取用户属性信息方法的一个实施例的流程。该方法可以由服务器来执行。Please refer to FIG. 1, which illustrates a flow of one embodiment of a method for obtaining user attribute information according to the present application. This method can be performed by the server.
如图1所示,在步骤101中,获取地图日志信息、定位日志信息和搜索引擎的日志信息。As shown in FIG. 1, in step 101, map log information, location log information, and log information of a search engine are acquired.
一般来说,当用户通过地图查询目标地址、路线信息时,终端可以与地图服务器建立通信连接,以获取用户所查询的地图数据;用户在通过终端的浏览器访问网页或进行检索时,终端可以与网页服务器建立通信连接,以获取用户所要访问的网页信息;用户打开终端的定位功能时,终端可以与定位服务器建立通信连接,以获取当前的定位数据。终端在向用户反馈查询结果、访问结果和定位结果的同时,可以将查询、访问和定位相关的信息保存至终端的存储器中,也可以将查询、访问和定位相关的信息上传至对应的服务器,生成日志信息并保存。具体地,用户通过地图查询的相关信息(例如查询时间、查询内容)可以保存为地图日志信息;终端通过GPS等方式定位的相关信息(例如终端的三维地理位置数据以及定位时间)可以保存为定位日志信息;用户通过浏览器上的搜索引擎进行检索的相关信息(例如检 索内容、检索的时间)可以保存为搜索引擎的日志信息。Generally, when the user queries the target address and the route information through the map, the terminal can establish a communication connection with the map server to obtain the map data that the user queries; when the user accesses the webpage or searches through the browser of the terminal, the terminal can Establishing a communication connection with the web server to obtain the webpage information that the user wants to access; when the user opens the positioning function of the terminal, the terminal can establish a communication connection with the positioning server to obtain the current positioning data. While feeding back the query result, the access result, and the positioning result to the user, the terminal may save the information related to the query, the access, and the location to the memory of the terminal, or upload the information related to the query, the access, and the location to the corresponding server. Generate log information and save it. Specifically, related information (such as query time and query content) that the user queries through the map can be saved as map log information; related information (such as the three-dimensional geographic location data of the terminal and the positioning time) that the terminal locates by means of GPS or the like can be saved as the positioning. Log information; information about the user's search through the search engine on the browser (such as checking The content of the content, the time of the retrieval) can be saved as the log information of the search engine.
在本实施例中,如果上述日志信息保存在终端的存储器中,则服务器可以通过网络从终端获取地图日志信息、定位日志信息和搜索引擎的日志信息。如果上述日志信息保存在对应的服务器中,则服务器可以分别从地图服务器获取地图日志信息,从定位服务器获取定位日志信息,从搜索引擎的服务器获取搜索引擎的日志信息。在一些实现方式中,服务器可以从终端和对应的地图服务器、定位服务器以及搜索引擎的服务器获取日志信息,并基于用户的标识信息(如IP地址、用户ID等)识别并提取同一用户的上述日志信息。In this embodiment, if the log information is saved in the memory of the terminal, the server may acquire map log information, location log information, and log information of the search engine from the terminal through the network. If the log information is saved in the corresponding server, the server may obtain the map log information from the map server, obtain the location log information from the location server, and obtain the log information of the search engine from the server of the search engine. In some implementations, the server may obtain log information from the terminal and the corresponding map server, the location server, and the server of the search engine, and identify and extract the log of the same user based on the user's identification information (eg, IP address, user ID, etc.). information.
上述地图日志信息、定位日志信息和搜索引擎的日志信息中可以包括:时间信息、操作信息、状态信息、网络协议信息、查询、定位或搜索结果以及结果中所包含的内容占用的存储空间等。在一些实现中,上述日志信息以字段形式保存。The map log information, the location log information, and the search engine log information may include: time information, operation information, status information, network protocol information, query, location, or search result, and storage space occupied by content included in the result. In some implementations, the above log information is saved as fields.
在步骤102中,对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理。In step 102, map log information, location log information, and search engine log information are preprocessed.
在本实施例中,服务器可以对获取的地图日志信息、定位日志信息以及搜索引擎的日志信息进行预处理,以获取用户的相关数据。用户的相关数据可以是与用户行为相关的数据,也可以是与用户的属性特征(例如性别、性格)相关的数据,还可以是与用户的地理位置相关的数据。如前面所述,上述日志信息中除了包含时间信息、操作信息、查询、定位或搜索结果之外,还包括与用户的行为、属性特征、地理位置无关的状态信息、网络协议信息。在本实施例中,服务器可以将日志信息中与用户行为无关的信息过滤,例如当日志信息为字段形式时,可以通过分析字段内容来滤除表示状态信息和网络协议的字段。In this embodiment, the server may preprocess the acquired map log information, the location log information, and the log information of the search engine to obtain related data of the user. The user's related data may be data related to the user's behavior, data related to the user's attribute characteristics (such as gender, personality), and data related to the user's geographic location. As described above, the log information includes time information, operation information, query, positioning, or search results, and includes status information irrelevant to the user's behavior, attribute characteristics, and geographic location, and network protocol information. In this embodiment, the server may filter information related to user behavior in the log information. For example, when the log information is in the form of a field, the field indicating the status information and the network protocol may be filtered out by analyzing the content of the field.
进一步参考图2,其示出了本申请实施例提供的对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理的方法的一个实施例的流程图。With further reference to FIG. 2, it is a flowchart of an embodiment of a method for pre-processing map log information, location log information, and search engine log information provided by an embodiment of the present application.
如图2所示,在步骤201中,对地图日志信息、定位日志信息和搜索引擎的日志信息所包含的数据进行分析。 As shown in FIG. 2, in step 201, data included in the map log information, the location log information, and the log information of the search engine are analyzed.
在本实施例中,服务器可以对获取的地图日志信息、定位日志信息和搜索引擎的日志信息进行分析,例如可以对字段形式的日志信息进行切分,并分析每一部分字段所代表的内容。举例而言,搜索引擎的一条日志信息中可以包含用户的IP地址、搜索时间、搜索内容、状态码、访问的URL等。可以分析该日志信息,从而确定用户的搜索时间和搜索内容。In this embodiment, the server may analyze the acquired map log information, the location log information, and the log information of the search engine. For example, the log information in the form of a field may be segmented, and the content represented by each partial field may be analyzed. For example, a log information of a search engine may include a user's IP address, search time, search content, status code, visited URL, and the like. This log information can be analyzed to determine the user's search time and search content.
在步骤202中,提取地图日志信息、定位日志信息和搜索引擎的日志信息中与地理位置和用户行为相关的数据,作为用户的相关数据。In step 202, data related to the geographical location and user behavior in the map log information, the location log information, and the log information of the search engine are extracted as related data of the user.
在本实施例中,服务器可以分别判断步骤201中分析所得到的数据是否与用户行为相关以及是否与地理位置相关。当分析所得到的数据与地理位置或用户行为相关时,可以将相关的数据提取出来,作为用户的相关数据。用户的相关数据可以用来分析用户的属性特征,构建用户画像。In this embodiment, the server may separately determine whether the data obtained in the analysis in step 201 is related to the user behavior and related to the geographic location. When the data obtained by the analysis is related to the geographical location or user behavior, the related data can be extracted as relevant data of the user. The user's relevant data can be used to analyze the user's attribute characteristics and construct a user portrait.
例如,在上述实施例中,可以从地图日志信息中提取地图查询时间、查询内容作为用户的相关数据;可以从定位日志信息中提取定位时间和定位的地理位置数据作为用户的相关数据;还可以从搜索引擎的日志信息中提取用户检索的关键字、检索时间、所点击的检索结果中的网页URL作为用户的相关数据。For example, in the foregoing embodiment, the map query time and the query content may be extracted from the map log information as related data of the user; the positioning time and the located geographical location data may be extracted from the positioning log information as related data of the user; The keyword retrieved by the user, the search time, and the URL of the clicked search result are extracted from the log information of the search engine as the relevant data of the user.
在一些实现中,用户的相关数据至少包括位置检索数据和/或定位数据。其中位置检索数据可以从地图日志信息、搜索引擎的日志信息中提取得出,定位数据可以从定位日志信息中获取。位置检索数据可以包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息,以及目标位置的周边数据。In some implementations, the user's related data includes at least location retrieval data and/or location data. The location retrieval data can be extracted from the map log information and the log information of the search engine, and the location data can be obtained from the location log information. The location retrieval data may include at least one of the following: target location search data, route search data and corresponding route information, and peripheral data of the target location.
具体地,目标位置搜索数据可以包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置。当用户通过终端上的电子地图搜索目标位置时,地图服务器可以根据用户的搜索请求查找到匹配的目的地地理位置,还可以获取用户当前地理位置,记录搜索的时刻,并保存至地图日志信息中。服务器可以从地图日志信息中获取这些数据,作为用户相关的数据。Specifically, the target location search data may include at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user. When the user searches for the target location through the electronic map on the terminal, the map server can find the matching destination geographic location according to the user's search request, and can also obtain the current geographic location of the user, record the search time, and save the information to the map log information. . The server can obtain this data from the map log information as user-related data.
路线搜索数据可以包括以下至少一项:用户检索路线的时刻、起 始地理位置、目标地理位置、轨迹数据以及对应的出行方式。其中轨迹数据可以为地图服务器查询到的路线轨迹,对应的出行方式可以为驾车、公共交通或步行。在一些实现中,路线搜索数据还可以包括起始地理位置与目标地理位置间的距离。The route search data may include at least one of the following: a time when the user retrieves the route, Start location, target location, trajectory data, and corresponding travel mode. The trajectory data may be a route trajectory queried by the map server, and the corresponding travel mode may be driving, public transportation or walking. In some implementations, the route search data can also include the distance between the starting geographic location and the target geographic location.
路线搜索数据的对应的线路信息可以为导航数据。导航数据可以包括详细的定位信息和在对应时刻的移动距离。The corresponding line information of the route search data may be navigation data. The navigation data may include detailed positioning information and a moving distance at a corresponding time.
目标位置的周边数据可以包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。例如当用户搜索某一旅游景点时,服务器可以获取旅游景点周边的酒店、餐饮场所的数据、该旅游景点附近的地铁站、公交站数量与距离或者该旅游景点周边的停车场数量和车位数量。The surrounding data of the target location may include at least one of the following: building data around the target location, traffic site data, and parking lot data. For example, when a user searches for a tourist attraction, the server can obtain data of a hotel, a restaurant, a subway station near the tourist attraction, the number and distance of the bus station, or the number of parking lots and the number of parking spaces around the tourist attraction.
在步骤203中,通过网络查找出与地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为用户的相关数据。In step 203, information related to the map log information, the location log information, and the data included in the log information of the search engine is found through the network as related data of the user.
除了从日志信息中提取上述与用户行为或地理位置相关的数据之外,服务器还可以通过网络查询与地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为用户的相关数据。例如当服务器从地图日志信息提取出用户搜索的目的地时,服务器可以查找到目的地的类型、属性等信息。举例而言,当目的地为餐馆时,可以通过网络查找到餐馆的平均消费、菜系、车位、评价、其他分店地址等信息,作为用户的相关数据;当目的地为购物场所时,可以查找售卖的商品品牌、类型、价位等信息,作为用户的相关数据。In addition to extracting the above-mentioned data related to the user behavior or the geographical location from the log information, the server may also query the information related to the map log information, the location log information, and the data contained in the log information of the search engine as the relevant data of the user through the network. . For example, when the server extracts the destination of the user search from the map log information, the server can find the type, attribute, and the like of the destination. For example, when the destination is a restaurant, the average consumption, cuisine, parking space, evaluation, other branch address, etc. of the restaurant can be found through the network as relevant data of the user; when the destination is a shopping place, the sale can be found. The product brand, type, price and other information, as the user's relevant data.
在上述实施例中,服务器可以分析日志信息、提取其中与用户行为或者地理位置相关的数据,还可以查询与步骤101中获取的日志信息包含的数据相关的信息,从而获取用户的相关数据。In the above embodiment, the server may analyze the log information, extract data related to the user behavior or the geographical location, and query the information related to the data included in the log information acquired in step 101, thereby acquiring related data of the user.
返回图1,在步骤103中,基于用户的相关数据获取用户的行为特征。Returning to Figure 1, in step 103, the user's behavioral characteristics are obtained based on the user's relevant data.
用户的行为特征可以通过对步骤102中获取的用户的相关数据进行分析得出。在本实施例中,服务器可以通过数据统计来推断用户的行为特征。用户的行为特征可以包括但不限于用户固定活动的地点、兴趣点信息、偏好的出行方式、社交关系。当用户的相关数据包括位 置检索数据和定位数据时,可以通过多种方式来分别获取用户的上述行为特征。The behavior characteristics of the user can be obtained by analyzing the relevant data of the user obtained in step 102. In this embodiment, the server can infer the behavior characteristics of the user by data statistics. The user's behavioral characteristics may include, but are not limited to, the location of the user's fixed activity, the point of interest information, the preferred mode of travel, and the social relationship. When the user's related data includes bits When the search data and the positioning data are set, the above behavior characteristics of the user can be separately obtained in various ways.
在一些实现中,服务器可以基于定位数据对用户停留的地理位置的分布进行统计和分析,以确定用户固定活动的地点。具体地,服务器可以统计停留的时间和频率,根据统计结果得出用户停留时间最长或者停留频率最高的几个地理位置,并结合停留的具体时刻信息推断用户的家庭地点、工作地点及其他规律性活动的地点。例如,可以将所有地理位置中停留时间最长且停留时刻为非工作时间的地理位置作为家庭地点;将停留时间最长且停留时刻为工作时间的地理位置作为用户的工作地点。在实际应用中,还可以将用户停留频率高且稳定的地理位置作为用户规律性活动的地点(例如健身房、用户喜欢的餐馆等)。用户固定活动的地点可以用于分析用户的职业、兴趣、住房价格等信息。In some implementations, the server can perform statistics and analysis on the distribution of geographic locations that the user stays based on the location data to determine the location of the user's fixed activity. Specifically, the server can count the time and frequency of the stay, and obtain the geographic location with the longest stay time or the highest frequency of staying according to the statistical result, and infer the user's family location, work place and other rules in combination with the specific time information of the stay. The location of sexual activity. For example, a geographical location in which all the staying time is the longest and the staying time is a non-working time can be regarded as a family place; a geographical position where the staying time is the longest and the staying time is the working time is taken as the working place of the user. In practical applications, it is also possible to use a geographical location where the user stays high and is stable as a place for regular activity of the user (for example, a gym, a restaurant that the user likes, etc.). The location of the user's fixed activity can be used to analyze the user's occupation, interest, housing price and other information.
在一些实现中,服务器可以基于位置检索数据获取用户的兴趣点信息。用户的兴趣点信息可以是用户可能感兴趣的信息。具体地,服务器可以获取用户检索的目的地类型等属性信息,根据目的地类型等属性信息获取相关的信息作为用户的兴趣点信息。进一步地,服务器还可以获取用户检索同一属性信息的目的地的频率信息,作为用户的兴趣点信息。例如,当位置检索数据中包含用户检索某机场的位置信息时,服务器可以获取该机场所在地的旅游景点信息、酒店信息等,作为用户的兴趣点信息,以分析用户的兴趣、性格等属性信息。又例如,当位置检索数据中包含用户检索酒店的位置时,服务器可以获取酒店的星级、消费水平等信息,作为用户的兴趣点信息,以分析用户的资产状况。位置检索数据中还可以包含诸如加油站、停车场、洗车地点相关的信息,服务器可以获取这些信息来分析用户是否有车等资产信息。In some implementations, the server can retrieve the user's point of interest information based on the location retrieval data. The user's point of interest information may be information that the user may be interested in. Specifically, the server may acquire attribute information such as a destination type retrieved by the user, and acquire related information as the interest point information of the user according to the attribute information such as the destination type. Further, the server may also acquire frequency information of a destination of the user retrieving the same attribute information as the user's interest point information. For example, when the location search data includes the location information of the user to retrieve an airport, the server may obtain the tourist attraction information, the hotel information, and the like of the location of the airport as the user's point of interest information to analyze the attribute information such as the user's interest and personality. For another example, when the location search data includes the location where the user retrieves the hotel, the server may obtain information such as the star rating and consumption level of the hotel as the user's point of interest information to analyze the user's asset status. The location search data may also contain information related to a gas station, a parking lot, and a car wash location, and the server may obtain such information to analyze whether the user has asset information such as a car.
在一些实现中,服务器可以基于位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式。位置检索数据中可以包含用户在检索路线时选择的路线信息及对应的出行方式,服务器可以对用户选择的出行方式进行统计,分析用户常用的出行方式(驾车、 公交、地铁、客车、火车、飞机等),从而推断用户对交通成本的敏感程度,进而分析用户的资产状况。In some implementations, the server may perform statistics and analysis on the way the user travels based on the location retrieval data to determine the manner in which the user prefers travel. The location search data may include route information selected by the user when the route is retrieved and the corresponding travel mode, and the server may perform statistics on the travel mode selected by the user, and analyze the travel mode commonly used by the user (driving, Buses, subways, buses, trains, airplanes, etc., to infer the user's sensitivity to traffic costs, and then analyze the user's asset status.
在一些实现中,服务器可以基于所述定位数据计算用户之间的相关度,以确定多个用户的亲密程度。例如服务器可以根据用户共同停留在临近地理位置等定位数据的交叉特征,分析用户之间社交属性的强弱,即多个用户的亲密程度。服务器还可以基于用户共同停留在临近地理位置的时间信息进一步分析用户的社交关系的类型,例如家人、朋友、工作伙伴等等。举例而言,当多个用户在非工作时间停留在同一地理位置的时间超过一个阈值时,可以认为多个用户的亲密程度较高,可能是家人或朋友关系。In some implementations, the server can calculate the degree of correlation between users based on the positioning data to determine the degree of intimacy of the plurality of users. For example, the server may analyze the strength of the social attributes between the users according to the cross-characteristics of the positioning data of the user staying in the adjacent geographical location, that is, the intimacy of the multiple users. The server may further analyze the type of social relationship of the user based on time information that the user stays in the adjacent geographic location, such as family, friends, work partners, and the like. For example, when a plurality of users stay in the same geographical location for a non-working time for more than one threshold, it may be considered that the plurality of users have a higher degree of intimacy, which may be a family or friend relationship.
可选地,服务器还可以获取用户所使用的移动终端的类型信息,例如安卓系统或ios系统,来推断用户的资产情况。Optionally, the server may also obtain type information of the mobile terminal used by the user, such as an Android system or an ios system, to infer the user's asset status.
以上示例性地描述了几种用户行为特征的获取方法。可以理解,步骤102中得到的用户的相关数据可以用于分析并获取多种其他类型的用户行为特征,例如可以基于用户的定位数据和位置检索数据分析用户经常停留的场所类型;可以基于用户的位置检索数据对用户的出行规律进行统计等。Several methods of acquiring user behavior features have been exemplarily described above. It can be understood that the related data of the user obtained in step 102 can be used to analyze and acquire various other types of user behavior characteristics, for example, the location type of the user frequently staying based on the user's positioning data and location retrieval data can be analyzed; The location search data is used to count the user's travel rules.
在步骤104中,基于用户的行为特征确定用户属性信息。In step 104, user attribute information is determined based on the user's behavior characteristics.
在本实施例中,服务器可以对用户的行为特征进行数据挖掘,基于步骤103的统计结果分析用户的属性信息。在一些实施例中,用户属性信息可以包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。In this embodiment, the server may perform data mining on the behavior characteristics of the user, and analyze the attribute information of the user based on the statistical result of step 103. In some embodiments, the user attribute information may include at least one of the user's age group, gender, occupation, interest, income level, spending habits, health status, social relationship, and fixed asset status.
在一些实现中,服务器对用户的行为特征进行数据挖掘后,可以推断出多个备选用户属性信息及其概率,例如得出多个不同的用户的年龄段及每个年龄段对应的概率。之后将概率最大的备选用户属性信息作为用户属性信息。In some implementations, after the server performs data mining on the behavior characteristics of the user, the plurality of candidate user attribute information and its probability may be inferred, for example, the age groups of the plurality of different users and the probability corresponding to each age group. The candidate user attribute information with the highest probability is then taken as the user attribute information.
具体地,用户的年龄段可以根据用户固定活动的地点来判断,例如用户固定活动的地点为大学时,可以推断用户的年龄较大概率为18-24岁,用户固定活动的地点为写字楼时,可以推断用户的年龄较大概率为24-50岁,用户固定活动的地点为老年中心时,可以推断用户 的年龄较大概率为大于50岁。用户的年龄段也可以根据用户的兴趣点信息进行推断。例如,用户通过地图检索游乐场时,可以推断用户的年龄为15-30岁的概率较大。Specifically, the age range of the user may be determined according to the location of the user's fixed activity. For example, when the location of the fixed activity of the user is a university, it may be inferred that the age of the user is 18-24 years old, and when the location of the fixed activity of the user is an office building, It can be inferred that the user's older age is 24-50 years old, and the user can infer the user when the fixed activity location is the elderly center. The older age is greater than 50 years old. The user's age range can also be inferred based on the user's point of interest information. For example, when a user retrieves a playground through a map, it can be inferred that the probability of the user being 15-30 years old is large.
用户的性别可以根据其兴趣点信息来判断,例如搜索美容院等地址的用户较大概率为女性,而搜索篮球场等地址的用户较大概率为男性。The gender of the user can be judged according to the information of the interest point. For example, the user who searches for an address such as a beauty salon has a higher probability of being a female, and the user who searches for an address such as a basketball court has a higher probability of being a male.
用户的职业可以通过用户的定位数据来判断。例如当用户的行为特征中用户固定活动的地点包括某IT企业的大厦时,则可以确定用户从事IT行业的概率较大;当用户的行为特征中用户固定活动的地点包括政府机关单位时,则可以确定用户的职业为公务员的概率较大。The user's occupation can be judged by the user's positioning data. For example, when the location of the user's fixed activity in the user's behavioral features includes the building of an IT enterprise, the probability that the user is engaged in the IT industry may be determined to be large; when the location of the user's fixed activity in the user's behavioral feature includes the government agency unit, then It can be determined that the probability that the user's occupation is a civil servant is large.
用户的兴趣可以通过用户的行为特征中的兴趣点信息来确定。例如当用户的兴趣点信息包含的运动场所数量超过一定值时,可以认为用户喜欢运动。The user's interest can be determined by the point of interest information in the user's behavioral characteristics. For example, when the number of sports places included in the user's point of interest information exceeds a certain value, the user may be considered to like sports.
用户的社交关系可以通过用户的位置检索数据来判断。例如当用户的位置检索数据中包含儿童游乐场、小学、少年宫等目标地点,则可以确定用户有小孩。进一步地,还可以通过有由位置检索数据获取的用户的兴趣点信息来判断用户是否有小孩。例如当用户的兴趣点信息中包含大量的类似“童”、“幼”、“婴”的关键词时,可以确定用户有小孩。The user's social relationship can be judged by retrieving data from the user's location. For example, when the location search data of the user includes a target place such as a children's playground, a primary school, a juvenile palace, etc., it may be determined that the user has a child. Further, it is also possible to determine whether the user has a child by having the point of interest information of the user acquired by the location retrieval data. For example, when the user's point of interest information contains a large number of keywords like "child", "young", and "infant", it can be determined that the user has a child.
用户的社交关系还可以通过由多个用户的定位数据获取的用户间的亲密程度来确定。例如当多个用户的亲密程度较高时,服务器可以确定多个用户是家人或朋友关系的概率较高。The social relationship of the user can also be determined by the degree of intimacy between users obtained by the positioning data of a plurality of users. For example, when the degree of intimacy of a plurality of users is high, the server can determine that the probability that a plurality of users are family or friend relationships is high.
用户的收入水平、消费水平和固定资产状况可以从用户的多种行为特征来推断。例如,可以基于用户对交通成本的敏感度推断用户的收入水平和固定资产状况,可以基于用户的家庭住址推断住宅价格,从而判断用户的固定资产状况;也可以根据用户检索的消费场所(如酒店、餐馆等)的消费价位来判断用户的收入水平和消费水平;还可以根据用户检索高消费场所(例如高尔夫球场等)的频率来确定用户的收入水平和消费水平。另外还可以根据用户出行时是否经常驾车以及定位数据中是否包含停车场、加油站地址来判断用户是否有车。 The user's income level, consumption level, and fixed asset status can be inferred from multiple behavioral characteristics of the user. For example, the user's income level and the fixed asset status may be inferred based on the user's sensitivity to the transportation cost, and the residential price may be inferred based on the user's home address to determine the user's fixed asset status; or the consumer location (eg, hotel) may be retrieved according to the user. The consumer price level of the user, the restaurant, etc.) to determine the user's income level and consumption level; the user's income level and consumption level can also be determined according to the frequency at which the user searches for a high-consumption place (such as a golf course, etc.). In addition, it is also possible to determine whether the user has a car according to whether the user frequently drives while traveling and whether the parking data or the gas station address is included in the positioning data.
用户的健康状况可以根据用户的定位数据中用户在医院和药房等医疗场所停留的频率来推断。例如当用户在医院停留的频率为每周2~3次时,可以推断用户健康状况差的概率较大。The health status of the user can be inferred based on the frequency of the user's location data in the medical field such as the hospital and the pharmacy. For example, when the frequency of the user staying in the hospital is 2 to 3 times per week, it can be inferred that the probability of the user's health condition being poor is large.
以上示例性地描述了几种用户属性信息的确定方法。需要说明的是,用户属性信息可以不限于以上所描述的几种信息,本申请所提供的方法也不限于用于确定以上描述的几种用户属性信息,还可以用于确定其他类型的用户属性信息,例如可以基于用户经常停留的服装店的统计数据确定用户喜欢的穿着风格;可以基于用户出行频率的统计数据分析用户的性格(外向或内向)等。The method of determining several kinds of user attribute information is exemplarily described above. It should be noted that the user attribute information may not be limited to the several types of information described above, and the method provided by the present application is not limited to determining several user attribute information described above, and may also be used to determine other types of user attributes. The information may be determined, for example, based on statistical data of the clothing store where the user frequently stays; the user's favorite wearing style may be analyzed; the user's personality (extroverted or introverted) or the like may be analyzed based on statistical data of the user's travel frequency.
在本实施例中,服务器可以根据预设的规则来计算每一用户属性信息的概率。例如可以预设一个频率阈值,当用户停留在某一地理位置的频率超过该频率阈值时,将用户家庭住址为该地理位置的概率设定为80%。又例如可以预设高档消费场所列表,当用户的定位数据中包含预设的高消费场所列表中的场所时可以将用户为高收入水平的概率设为大于50%。In this embodiment, the server may calculate the probability of each user attribute information according to a preset rule. For example, a frequency threshold may be preset. When the frequency of the user staying in a certain geographic location exceeds the frequency threshold, the probability that the user's home address is the geographic location is set to 80%. For example, the list of high-end consumer places may be preset. When the location data of the user includes a preset location in the high-consumption list, the probability that the user is a high-income level may be set to be greater than 50%.
在一些可选的实现方式中,服务器可以基于用户的行为特征,采用已训练的模型确定用户属性信息。服务器可以将通过已知用户属性的用户的位置检索数据和定位数据获取的用户行为特征作为训练集来训练分类模型。该分类模型可以包括多个子模型,每个子模型用于对用户的一种属性信息进行分类。该分类模型也可以为一个综合模型,用于对用户的多种属性信息进行分类。可选地,还可以基于通过与训练集中不同的用户的位置检索数据和定位数据获取的用户行为特征建立测试集,对分类模型进行优化。在应用时,可以利用已训练并优化的模型对步骤103中获取的用户的行为特征分析,得出用户的属性信息。In some optional implementations, the server may determine the user attribute information using the trained model based on the user's behavioral characteristics. The server may train the classification model as a training set by using the location retrieval data of the user of the known user attribute and the user behavior feature acquired by the positioning data. The classification model may include a plurality of sub-models, each of which is used to classify one attribute information of the user. The classification model can also be a comprehensive model for classifying various attribute information of users. Optionally, the classification model may also be optimized based on a user behavior feature obtained by retrieving data and positioning data acquired by a user different from the training set. In the application, the trained and optimized model can be used to analyze the behavior characteristics of the user acquired in step 103, and the attribute information of the user is obtained.
上述实施例所提供的用于获取用户属性信息的方法,通过对获取的地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据,之后基于用户的相关数据获取用户的行为特征,最后基于用户的行为特征确定用户属性信息,充分利用了用户的定位、地图搜索等信息分析用户属性信息,提升了所获取的用户属性 信息的全面性和准确性。The method for obtaining user attribute information provided by the foregoing embodiment is performed by pre-processing the acquired map log information, the location log information, and the log information of the search engine to obtain related data of the user, and then acquiring the related data based on the user. The behavior characteristics of the user, and finally determine the user attribute information based on the user's behavior characteristics, fully utilize the user's positioning, map search and other information to analyze the user attribute information, and improve the acquired user attributes. The comprehensiveness and accuracy of the information.
本申请上述实施例所提供的用于获取用户属性信息的方法,可以用于构建用户画像。进一步地,可以基于用户画像向用户推荐内容,例如地图服务器可以根据用户画像分析用户饮食偏好,在用户搜索美食时向用户推荐更符合用户口味的个性化美食。还可以基于用户画像为选址提供依据,例如选址服务器可以基于用户画像确定目标群体的分布、偏好等特征,对选址结果进行优化。The method for obtaining user attribute information provided by the above embodiment of the present application can be used to construct a user portrait. Further, the content may be recommended to the user based on the user portrait. For example, the map server may analyze the user's eating preferences according to the user's portrait, and recommend the personalized food that is more in line with the user's taste when the user searches for the food. It is also possible to provide a basis for the location selection based on the user image. For example, the location server can determine the distribution, preference and the like of the target group based on the user image, and optimize the location result.
进一步参考图3,其示出了本申请实施例提供的用于获取用户属性信息的装置的一个实施例的结构示意图。如图3所示,用于获取用户属性信息的装置300可以包括第一获取单元301、预处理单元302、第二获取单元303以及确定单元304。其中,第一获取单元301可以用于获取地图日志信息、定位日志信息和搜索引擎的日志信息。预处理单元302可以用于对地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据。第二获取单元303可以用于基于用户的相关数据获取用户的行为特征。确定单元304可以用于基于用户的行为特征确定用户属性信息。With reference to FIG. 3, it is a schematic structural diagram of an embodiment of an apparatus for acquiring user attribute information provided by an embodiment of the present application. As shown in FIG. 3, the apparatus 300 for acquiring user attribute information may include a first obtaining unit 301, a pre-processing unit 302, a second obtaining unit 303, and a determining unit 304. The first obtaining unit 301 can be configured to acquire map log information, location log information, and log information of a search engine. The pre-processing unit 302 can be configured to pre-process the map log information, the location log information, and the log information of the search engine to obtain related data of the user. The second obtaining unit 303 can be configured to acquire a behavior characteristic of the user based on the related data of the user. The determining unit 304 can be configured to determine user attribute information based on the behavior characteristics of the user.
在本实施例中,第一获取单元301可以从终端或地图服务器、定位服务器以及搜索引擎的服务器中获取同一用户的地图日志信息、定位日志信息和搜索引擎的日志信息。预处理单元302可以对第一获取单元301所获取的日志信息中包含的数据进行分析,提取其中与用户行为或地理位置相关的数据作为用户的相关数据。预处理单元302还可以通过网络查找出与地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为所述用户的相关数据。In this embodiment, the first obtaining unit 301 may acquire map log information, location log information, and log information of the search engine of the same user from the terminal or the map server, the location server, and the server of the search engine. The pre-processing unit 302 may analyze the data included in the log information acquired by the first obtaining unit 301, and extract data related to the user behavior or the geographical location as related data of the user. The pre-processing unit 302 can also find out, through the network, information related to map log information, location log information, and data included in the log information of the search engine as related data of the user.
在一些实现中,用户的相关数据可以至少包括位置检索数据和/或定位数据。其中,位置检索数据可以包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息;以及目标位置的周边数据。In some implementations, the user's related data can include at least location retrieval data and/or positioning data. The location search data may include at least one of the following: target location search data, route search data, and corresponding line information; and peripheral data of the target location.
进一步地,目标位置搜索数据可以包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置。路线搜索数据可以包括以下至少一项:用户检索路线的时刻、起始地理位置、目标地理位置、轨 迹数据以及对应的出行方式。目标位置的周边数据可以包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。Further, the target location search data may include at least one of the following: a destination of the search, a time of the search, and a current geographic location of the user. The route search data may include at least one of the following: a time when the user retrieves the route, a starting geographic location, a target geographic location, and a track Trace data and the corresponding travel mode. The surrounding data of the target location may include at least one of the following: building data around the target location, traffic site data, and parking lot data.
进一步地,第二获取单元303可以用于按如下至少一种方式获取用户的行为特征:基于预处理单元302得出的定位数据对用户停留的地理位置的分布进行统计和分析,以确定用户固定活动的地点;基于预处理单元302得出的位置检索数据获取用户的兴趣点信息;基于预处理单元302得出的位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式;基于预处理单元302得出的定位数据计算用户之间的相关度,以确定多个用户的亲密程度。Further, the second obtaining unit 303 may be configured to acquire a behavior characteristic of the user according to at least one of the following: perform statistics and analysis on the distribution of the geographic location where the user stays based on the positioning data obtained by the pre-processing unit 302, to determine that the user is fixed. The location of the activity; the location retrieval data obtained by the preprocessing unit 302 is used to obtain the user's point of interest information; based on the location retrieval data obtained by the preprocessing unit 302, the user's travel mode is statistically analyzed and analyzed to determine the user's preferred travel mode. The correlation between the users is calculated based on the positioning data obtained by the pre-processing unit 302 to determine the degree of intimacy of the plurality of users.
确定单元304可以基于第二获取单元303所获取的用户的行为特征,采用已训练的模型确定用户属性信息。其中,用户属性信息可以包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。The determining unit 304 may determine the user attribute information using the trained model based on the behavior characteristics of the user acquired by the second obtaining unit 303. The user attribute information may include at least one of the following: a user's age group, gender, occupation, interest, income level, consumption habits, health status, social relationship, and fixed asset status.
应当理解,用于获取用户属性信息的装置300中记载的诸单元参考图1-2描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作和特征同样适用于用于获取用户属性信息的装置300及其中包含的单元,在此不再赘述。It should be understood that the units recited in apparatus 300 for obtaining user attribute information correspond to the various steps in the method described in FIGS. 1-2. Thus, the operations and features described above for the method are equally applicable to the apparatus 300 for acquiring user attribute information and the units included therein, and are not described herein again.
进一步参考图4,其示出了可以应用本申请实施例的示例性系统架构示意图。如图4所示,系统400可以包括终端401、402、地图服务器403、定位服务器404、搜索引擎的服务器405以及用于实现本申请所提供的用于获取用户属性信息的服务器407。服务器407可以包括上述实施例中用于获取用户属性信息的装置300。在系统架构中还可以包括网络406。网络406用以在终端401、402、服务器403、404、405和407之间提供通信链路的介质。网络406可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。With further reference to FIG. 4, a schematic diagram of an exemplary system architecture to which embodiments of the present application may be applied is shown. As shown in FIG. 4, the system 400 can include terminals 401, 402, a map server 403, a location server 404, a search engine server 405, and a server 407 for implementing user attribute information provided by the present application. The server 407 may include the apparatus 300 for acquiring user attribute information in the above embodiment. Network 406 may also be included in the system architecture. Network 406 is used to provide a medium for communication links between terminals 401, 402, servers 403, 404, 405, and 407. Network 406 can include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
终端401、402可以通过网络406与服务器403、404、405、407交互,以接收或发送消息等。终端401、402具有定位功能,可以安装地图应用和浏览器,可以将地图日志信息通过网络406发送给地图服务器403,将定位日志信息通过网络发送到定位服务器404,将搜索引擎的日志信息通过网络406发送给搜索引擎的服务器405。服务器407 可以通过网络406从终端401、402以及服务器403、404、405获取日志信息,对日志信息进行预处理,提取用户行为特征,确定用户属性信息。在实际应用中,服务器407还可以将确定的用户属性信息通过网络406发送给地图服务器403和搜索引擎的服务器405。地图服务器403可以在用户使用地图搜索目标地点时基于用户属性信息向用户推荐相关信息;搜索引擎的服务器405可以基于用户属性信息对网页进行重新排序,以使用户更快地搜索到满足需求的信息。The terminals 401, 402 can interact with the servers 403, 404, 405, 407 over the network 406 to receive or transmit messages and the like. The terminal 401, 402 has a positioning function, and can install a map application and a browser. The map log information can be sent to the map server 403 through the network 406, and the positioning log information is sent to the positioning server 404 through the network, and the log information of the search engine is transmitted through the network. 406 is sent to the server 405 of the search engine. Server 407 The log information can be obtained from the terminals 401, 402 and the servers 403, 404, 405 through the network 406, the log information is preprocessed, the user behavior characteristics are extracted, and the user attribute information is determined. In an actual application, the server 407 can also send the determined user attribute information to the map server 403 and the server 405 of the search engine via the network 406. The map server 403 may recommend relevant information to the user based on the user attribute information when the user searches for the target location using the map; the server 405 of the search engine may reorder the web pages based on the user attribute information to enable the user to quickly search for information satisfying the demand. .
终端401、402可以是各种电子设备,包括但不限于个人电脑、智能手机、智能手表、平板电脑、个人数字助理等等。服务器407可以对接收到的数据进行存储、分析等处理,并将处理结果反馈给终端和服务器403、404以及405。The terminals 401, 402 can be various electronic devices including, but not limited to, personal computers, smart phones, smart watches, tablets, personal digital assistants, and the like. The server 407 can perform processing of storing, analyzing, and the like on the received data, and feed back the processing result to the terminal and the servers 403, 404, and 405.
应当理解,图4中的终端、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端、网络和服务器。It should be understood that the number of terminals, networks, and servers in Figure 4 is merely illustrative. Depending on the implementation needs, there can be any number of terminals, networks, and servers.
进一步参考图5,其示出了是适于用来实现本申请实施例的服务器的计算机系统的结构示意图。如图5所示,计算机系统500包括中央处理单元(CPU)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储部分508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有系统500操作所需的各种程序和数据。CPU 501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。With further reference to FIG. 5, a block diagram of a computer system suitable for use in implementing the server of the embodiments of the present application is shown. As shown in FIG. 5, computer system 500 includes a central processing unit (CPU) 501 that can be loaded into a program in random access memory (RAM) 503 according to a program stored in read only memory (ROM) 502 or from storage portion 508. And perform various appropriate actions and processes. In the RAM 503, various programs and data required for the operation of the system 500 are also stored. The CPU 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also coupled to bus 504.
以下部件连接至I/O接口505:输入部分506;输出部分507;包括硬盘等的存储部分508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分509。通信部分509经由诸如因特网的网络执行通信处理。驱动器510也根据需要连接至I/O接口505。可拆卸介质511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器510上,以便于从其上读出的计算机程序根据需要被安装入存储部分508。The following components are connected to the I/O interface 505: an input portion 506; an output portion 507; a storage portion 508 including a hard disk or the like; and a communication portion 509 including a network interface card such as a LAN card, a modem, and the like. The communication section 509 performs communication processing via a network such as the Internet. Driver 510 is also coupled to I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 510 as needed so that a computer program read therefrom is installed into the storage portion 508 as needed.
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码 的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products in accordance with various embodiments of the invention. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or code. In part, the module, program segment, or portion of code includes one or more executable instructions for implementing the specified logical functions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括第一获取单元,预处理单元、第二获取单元和确定单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,预处理单元还可以被描述为“用于预处理的单元”。The units involved in the embodiments of the present application may be implemented by software or by hardware. The described unit may also be provided in the processor, for example, as a processor comprising a first acquisition unit, a pre-processing unit, a second acquisition unit and a determination unit. The names of these units do not constitute a limitation on the unit itself under certain circumstances. For example, the pre-processing unit may also be described as "a unit for pre-processing".
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。所述计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本申请的用于获取用户属性信息的方法。In another aspect, the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into a terminal. The computer readable storage medium stores one or more programs that are used by one or more processors to perform the methods for obtaining user attribute information as described herein.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。 The above description is only a preferred embodiment of the present application and a description of the principles of the applied technology. It should be understood by those skilled in the art that the scope of the invention referred to in the present application is not limited to the specific combination of the above technical features, and should also be covered by the above technical features without departing from the inventive concept. Other technical solutions formed by any combination of their equivalent features. For example, the above features are combined with the technical features disclosed in the present application, but are not limited to the technical features having similar functions.

Claims (19)

  1. 一种用于获取用户属性信息的方法,其特征在于,所述方法包括:A method for obtaining user attribute information, the method comprising:
    获取地图日志信息、定位日志信息和搜索引擎的日志信息;Obtain map log information, locate log information, and log information of the search engine;
    对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据;Pre-processing the map log information, the location log information, and the search engine log information to obtain related data of the user;
    基于所述用户的相关数据获取用户的行为特征;以及Acquiring user behavior characteristics based on relevant data of the user;
    基于所述用户的行为特征确定用户属性信息。User attribute information is determined based on the behavior characteristics of the user.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,包括:The method according to claim 1, wherein the pre-processing of the map log information, the location log information, and the log information of the search engine comprises:
    对所述地图日志信息、定位日志信息和搜索引擎的日志信息所包含的数据进行分析;And analyzing data included in the map log information, the location log information, and the search engine log information;
    提取所述地图日志信息、定位日志信息和搜索引擎的日志信息中与地理位置和用户行为相关的数据,作为所述用户的相关数据。Data related to the geographical location and user behavior in the map log information, the location log information, and the log information of the search engine are extracted as related data of the user.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,还包括:The method according to claim 2, wherein the pre-processing the map log information, the location log information, and the log information of the search engine further includes:
    通过网络查找出与所述地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为所述用户的相关数据。Information related to the map log information, the location log information, and the data included in the log information of the search engine is found through the network as related data of the user.
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述用户的相关数据至少包括位置检索数据和/或定位数据;The method according to any one of claims 1-3, wherein the related data of the user comprises at least location retrieval data and/or positioning data;
    其中,所述位置检索数据包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息、以及目标位置的周边数据。The location retrieval data includes at least one of the following: target location search data, route search data and corresponding route information, and peripheral data of the target location.
  5. 根据权利要求4所述的方法,其特征在于,The method of claim 4 wherein:
    所述目标位置搜索数据包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置; The target location search data includes at least one of: a destination of the search, a time of the search, and a current geographic location of the user;
    所述路线搜索数据包括以下至少一项:用户检索路线的时刻、起始地理位置、目标地理位置、轨迹数据以及对应的出行方式;The route search data includes at least one of: a time when the user retrieves the route, a starting geographic location, a target geographic location, trajectory data, and a corresponding travel mode;
    所述目标位置的周边数据包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。The surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
  6. 根据权利要求4或5所述的方法,其特征在于,所述基于所述用户的相关数据获取用户的行为特征,包括以下至少一项:The method according to claim 4 or 5, wherein the acquiring the behavior characteristics of the user based on the related data of the user comprises at least one of the following:
    基于所述定位数据对用户停留的地理位置的分布进行统计和分析,以确定用户固定活动的地点;Performing statistics and analysis on the distribution of the geographic location where the user stays based on the positioning data to determine the location of the user's fixed activity;
    基于所述位置检索数据获取用户的兴趣点信息;Obtaining interest point information of the user based on the location retrieval data;
    基于所述位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式;Calculating and analyzing the travel mode of the user based on the location retrieval data to determine the travel mode of the user preference;
    基于所述定位数据计算用户之间的相关度,以确定多个用户的亲密程度。A correlation between users is calculated based on the positioning data to determine the degree of intimacy of the plurality of users.
  7. 根据权利要求1-6任意一项所述的方法,其特征在于,所述基于所述用户的行为特征确定用户属性信息,包括:The method according to any one of claims 1-6, wherein the determining user attribute information based on the behavior characteristics of the user comprises:
    基于所述用户的行为特征,采用已训练的模型确定所述用户属性信息。The user attribute information is determined using the trained model based on the behavior characteristics of the user.
  8. 根据权利要求1-7任意一项所述的方法,其特征在于,所述用户属性信息包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。The method according to any one of claims 1 to 7, wherein the user attribute information comprises at least one of the following: a user's age group, gender, occupation, interest, income level, consumption habits, health status, society Relationship and the status of fixed assets.
  9. 一种用于获取用户属性信息的装置,其特征在于,所述装置包括:An apparatus for acquiring user attribute information, wherein the apparatus comprises:
    第一获取单元,用于获取地图日志信息、定位日志信息和搜索引擎的日志信息;a first acquiring unit, configured to acquire map log information, locate log information, and log information of a search engine;
    预处理单元,用于对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据; a pre-processing unit, configured to perform pre-processing on the map log information, the location log information, and the log information of the search engine to obtain related data of the user;
    第二获取单元,用于基于所述用户的相关数据获取用户的行为特征;以及a second acquiring unit, configured to acquire a behavior characteristic of the user based on the related data of the user;
    确定单元,用于基于所述用户的行为特征确定用户属性信息。a determining unit, configured to determine user attribute information based on a behavior characteristic of the user.
  10. 根据权利要求9所述的装置,其特征在于,所述预处理单元用于按如下方式对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理:The apparatus according to claim 9, wherein the pre-processing unit is configured to pre-process the map log information, the location log information, and the log information of the search engine as follows:
    对所述地图日志信息、定位日志信息和搜索引擎的日志信息所包含的数据进行分析;And analyzing data included in the map log information, the location log information, and the search engine log information;
    提取所述地图日志信息、定位日志信息和搜索引擎的日志信息中与地理位置和用户行为相关的数据,作为所述用户的相关数据。Data related to the geographical location and user behavior in the map log information, the location log information, and the log information of the search engine are extracted as related data of the user.
  11. 根据权利要求10所述的装置,其特征在于,所述预处理单元还用于按如下方式对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理:The apparatus according to claim 10, wherein the pre-processing unit is further configured to pre-process the map log information, the location log information, and the log information of the search engine as follows:
    通过网络查找出与所述地图日志信息、定位日志信息和搜索引擎的日志信息包含的数据相关的信息作为所述用户的相关数据。Information related to the map log information, the location log information, and the data included in the log information of the search engine is found through the network as related data of the user.
  12. 根据权利要求9-11任意一项所述的装置,其特征在于,所述用户的相关数据至少包括位置检索数据和/或定位数据;The apparatus according to any one of claims 9-11, wherein the related data of the user comprises at least location retrieval data and/or positioning data;
    其中,所述位置检索数据包括以下至少一项:目标位置搜索数据、路线搜索数据和对应的线路信息;以及目标位置的周边数据。The location retrieval data includes at least one of the following: target location search data, route search data, and corresponding line information; and peripheral data of the target location.
  13. 根据权利要求12所述的装置,其特征在于,The device according to claim 12, characterized in that
    所述目标位置搜索数据包括以下至少一项:搜索的目的地、搜索的时刻、用户当前地理位置;The target location search data includes at least one of: a destination of the search, a time of the search, and a current geographic location of the user;
    所述路线搜索数据包括以下至少一项:用户检索路线的时刻、起始地理位置、目标地理位置、轨迹数据以及对应的出行方式;The route search data includes at least one of: a time when the user retrieves the route, a starting geographic location, a target geographic location, trajectory data, and a corresponding travel mode;
    所述目标位置的周边数据包括以下至少一项:目标位置周边的建筑物数据、交通站点数据、停车场数据。 The surrounding data of the target location includes at least one of the following: building data around the target location, traffic site data, and parking lot data.
  14. 根据权利要求12或13所述的装置,其特征在于,所述第二获取单元用于按如下至少一种方式获取用户的行为特征:The device according to claim 12 or 13, wherein the second obtaining unit is configured to acquire behavior characteristics of the user in at least one of the following manners:
    基于所述定位数据对用户停留的地理位置的分布进行统计和分析,以确定用户固定活动的地点;Performing statistics and analysis on the distribution of the geographic location where the user stays based on the positioning data to determine the location of the user's fixed activity;
    基于所述位置检索数据获取用户的兴趣点信息;Obtaining interest point information of the user based on the location retrieval data;
    基于所述位置检索数据对用户的出行方式进行统计和分析,以确定用户偏好的出行方式;Calculating and analyzing the travel mode of the user based on the location retrieval data to determine the travel mode of the user preference;
    基于所述定位数据计算用户之间的相关度,以确定多个用户的亲密程度。A correlation between users is calculated based on the positioning data to determine the degree of intimacy of the plurality of users.
  15. 根据权利要求9-14任意一项所述的装置,其特征在于,所述确定单元基于所述用户的行为特征,采用已训练的模型确定所述用户属性信息。The apparatus according to any one of claims 9-14, wherein the determining unit determines the user attribute information using a trained model based on behavior characteristics of the user.
  16. 根据权利要求9-15任意一项所述的装置,其特征在于,所述用户属性信息包括以下至少一项:用户的年龄段、性别、职业、兴趣、收入水平、消费习惯、健康状况、社会关系以及固定资产状况。The device according to any one of claims 9-15, wherein the user attribute information comprises at least one of the following: a user's age group, gender, occupation, interest, income level, consumption habits, health status, society Relationship and the status of fixed assets.
  17. 一种服务器,其特征在于,包括如权利要求9-16任一所述的装置。A server comprising the apparatus of any of claims 9-16.
  18. 一种设备,其特征在于,包括:An apparatus, comprising:
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或多个处理器执行时:One or more programs, the one or more programs being stored in the memory, when executed by the one or more processors:
    获取地图日志信息、定位日志信息和搜索引擎的日志信息;Obtain map log information, locate log information, and log information of the search engine;
    对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据; Pre-processing the map log information, the location log information, and the search engine log information to obtain related data of the user;
    基于所述用户的相关数据获取用户的行为特征;以及Acquiring user behavior characteristics based on relevant data of the user;
    基于所述用户的行为特征确定用户属性信息。User attribute information is determined based on the behavior characteristics of the user.
  19. 一种非易失性计算机存储介质,所述计算机存储介质存储有一个或多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:A non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by a device, causing the device to:
    获取地图日志信息、定位日志信息和搜索引擎的日志信息;Obtain map log information, locate log information, and log information of the search engine;
    对所述地图日志信息、定位日志信息和搜索引擎的日志信息进行预处理,以获取用户的相关数据;Pre-processing the map log information, the location log information, and the search engine log information to obtain related data of the user;
    基于所述用户的相关数据获取用户的行为特征;以及Acquiring user behavior characteristics based on relevant data of the user;
    基于所述用户的行为特征确定用户属性信息。 User attribute information is determined based on the behavior characteristics of the user.
PCT/CN2015/089823 2015-06-26 2015-09-17 Method and device for obtaining user attribute information, and server WO2016206196A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510363062.3A CN104933157A (en) 2015-06-26 2015-06-26 Method and device used for obtaining user attribute information, and server
CN201510363062.3 2015-06-26

Publications (1)

Publication Number Publication Date
WO2016206196A1 true WO2016206196A1 (en) 2016-12-29

Family

ID=54120324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/089823 WO2016206196A1 (en) 2015-06-26 2015-09-17 Method and device for obtaining user attribute information, and server

Country Status (2)

Country Link
CN (1) CN104933157A (en)
WO (1) WO2016206196A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019916A (en) * 2018-08-17 2019-07-16 平安普惠企业管理有限公司 Event-handling method, device, equipment and storage medium based on user's portrait
CN110300084A (en) * 2018-03-22 2019-10-01 北京京东尚科信息技术有限公司 A kind of IP address-based portrait method and apparatus
CN110781374A (en) * 2018-07-13 2020-02-11 北京字节跳动网络技术有限公司 User data matching method and device, electronic equipment and computer readable medium
CN111178925A (en) * 2018-11-09 2020-05-19 百度在线网络技术(北京)有限公司 User portrait attribute prediction method, device, server and computer readable medium
CN111400567A (en) * 2020-03-11 2020-07-10 北京古杉数据科技有限公司 AI-based user data processing method, device and system
CN111782611A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Prediction model modeling method, device, equipment and storage medium
CN112184388A (en) * 2020-10-12 2021-01-05 上海燕汐软件信息科技有限公司 Home decoration service ticket distribution method and device and storage medium
CN112330351A (en) * 2020-02-28 2021-02-05 北京京东振世信息技术有限公司 Method for selecting address, address selecting system and electronic equipment
CN113836195A (en) * 2021-09-02 2021-12-24 国家电网有限公司客户服务中心 Knowledge recommendation method and system based on user portrait

Families Citing this family (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105160016A (en) * 2015-09-25 2015-12-16 百度在线网络技术(北京)有限公司 Method and device for acquiring user attributes
CN106557942B (en) * 2015-09-30 2020-07-10 百度在线网络技术(北京)有限公司 User relationship identification method and device
CN106611017B (en) * 2015-10-27 2021-06-29 北京嘀嘀无限科技发展有限公司 User identity identification method and device
CN105488149A (en) * 2015-11-26 2016-04-13 上海晶赞科技发展有限公司 Data processing method and device
CN106855864A (en) * 2015-12-09 2017-06-16 北京秒针信息咨询有限公司 A kind of method and apparatus of extraction information
CN105630951B (en) * 2015-12-23 2019-06-07 北京奇虎科技有限公司 Judge user's vocational distribution method and apparatus of cluster
CN106919579B (en) * 2015-12-24 2020-11-06 腾讯科技(深圳)有限公司 Information processing method, device and equipment
CN105678457A (en) * 2016-01-06 2016-06-15 成都小步创想畅联科技有限公司 Method for evaluating user behavior on the basis of position mining
CN106096653B (en) * 2016-06-12 2019-10-22 中国科学院自动化研究所 Ascribed characteristics of population estimating method based on cross-platform user social contact multimedia behavior
CN107643974B (en) * 2016-07-20 2021-03-02 阿里巴巴集团控股有限公司 Method and device for sending recall information
CN106649524B (en) * 2016-10-20 2019-11-22 天聚地合(苏州)数据股份有限公司 A kind of deep learning intelligent response system of the modified based on computer cloud data
CN106503123A (en) * 2016-10-20 2017-03-15 宁波江东大金佰汇信息技术有限公司 A kind of deep learning intelligent response system based on computer cloud data
CN106682427A (en) * 2016-12-29 2017-05-17 平安科技(深圳)有限公司 Personal health condition assessment method and device based position services
CN108734327A (en) * 2017-04-20 2018-11-02 腾讯科技(深圳)有限公司 A kind of data processing method, device and server
CN107316098B (en) * 2017-05-19 2021-03-30 安徽智博新材料科技有限公司 Automobile leasing point addressing method based on user behavior analysis
CN107274217A (en) * 2017-05-27 2017-10-20 冯小平 Determine user's current behavior and the method and apparatus for predicting user view
CN107316204A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN108984555B (en) * 2017-06-01 2021-09-28 腾讯科技(深圳)有限公司 User state mining and information recommendation method, device and equipment
CN107247593B (en) * 2017-06-09 2021-02-12 泰康保险集团股份有限公司 User interface switching method and device, electronic equipment and storage medium
CN107391603B (en) * 2017-06-30 2020-12-18 北京奇虎科技有限公司 User portrait establishing method and device for mobile terminal
CN107748996A (en) * 2017-09-18 2018-03-02 福建凯斯诺物联科技股份有限公司 User's occupation method of discrimination based on Quick Response Code
CN107747947B (en) * 2017-10-23 2021-04-30 电子科技大学 Collaborative travel route recommendation method based on historical GPS (global positioning system) track of user
CN110020211B (en) * 2017-10-23 2021-08-17 北京京东尚科信息技术有限公司 Method and device for evaluating influence of user attributes
CN110737690A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 User label mining method and device, computer equipment and storage medium
CN109063059B (en) * 2018-07-20 2021-07-27 腾讯科技(深圳)有限公司 Behavior log processing method and device and electronic equipment
CN109086384A (en) * 2018-07-26 2018-12-25 珠海卓邦科技有限公司 Water affairs management method and system based on user's portrait
CN110895587B (en) * 2018-08-23 2022-08-26 百度在线网络技术(北京)有限公司 Method and device for determining target user
CN109255383A (en) * 2018-09-10 2019-01-22 北京唐冠天朗科技开发有限公司 A kind of matching process and system of identity information and displaying information
CN109344253A (en) * 2018-09-18 2019-02-15 平安科技(深圳)有限公司 Add method, apparatus, computer equipment and the storage medium of user tag
CN109410568B (en) * 2018-09-18 2022-04-22 广东中标数据科技股份有限公司 Get-off site presumption method and system based on user portrait and transfer rule
CN111104609B (en) * 2018-10-26 2023-10-10 百度在线网络技术(北京)有限公司 Inter-person relationship prediction method, inter-person relationship prediction device, and storage medium
CN111127064B (en) * 2018-11-01 2023-08-25 百度在线网络技术(北京)有限公司 Method and device for determining social attribute of user and electronic equipment
CN111127065B (en) * 2018-11-01 2023-07-25 百度在线网络技术(北京)有限公司 User job site acquisition method and device
CN109472640A (en) * 2018-11-09 2019-03-15 斑马网络技术有限公司 Client's recognition methods, device, equipment and storage medium
CN109587326B (en) * 2018-11-09 2021-08-10 深圳壹账通智能科技有限公司 Prompting method and device of mobile terminal, storage medium and computer equipment
CN111310882A (en) * 2018-12-11 2020-06-19 北京京东尚科信息技术有限公司 Method and apparatus for generating information
CN110347917A (en) * 2019-06-14 2019-10-18 北京纵横无双科技有限公司 A kind of medical information method for pushing and device
CN110347936A (en) * 2019-06-21 2019-10-18 上海淇馥信息技术有限公司 Data digging method, device, system and recording medium based on LBS information
CN110544378B (en) * 2019-09-02 2020-11-03 上海评驾科技有限公司 Method for judging traffic jam condition of mobile phone user

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100131335A1 (en) * 2008-11-25 2010-05-27 Roh Dong-Hyun User interest mining method based on user behavior sensed in mobile device
CN104239526A (en) * 2014-09-18 2014-12-24 百度在线网络技术(北京)有限公司 POI (Point of Interest) labeling method and device for electronic map
CN104391853A (en) * 2014-09-25 2015-03-04 深圳大学 POI (point of interest) recommending method, POI information processing method and server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7272489B2 (en) * 2002-07-18 2007-09-18 Alpine Electronics, Inc. Navigation method and system for extracting, sorting and displaying POI information
CN104537027B (en) * 2014-12-19 2019-05-10 百度在线网络技术(北京)有限公司 Information recommendation method and device
CN104732062A (en) * 2015-01-30 2015-06-24 上海语镜汽车信息技术有限公司 On-road user socialization attribute automatic judgment method based on characteristic event, movement behavior, traveling track and geographic position

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100131335A1 (en) * 2008-11-25 2010-05-27 Roh Dong-Hyun User interest mining method based on user behavior sensed in mobile device
CN104239526A (en) * 2014-09-18 2014-12-24 百度在线网络技术(北京)有限公司 POI (Point of Interest) labeling method and device for electronic map
CN104391853A (en) * 2014-09-25 2015-03-04 深圳大学 POI (point of interest) recommending method, POI information processing method and server

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110300084A (en) * 2018-03-22 2019-10-01 北京京东尚科信息技术有限公司 A kind of IP address-based portrait method and apparatus
CN110300084B (en) * 2018-03-22 2023-09-01 北京京东尚科信息技术有限公司 IP address-based portrait method and apparatus, electronic device, and readable medium
CN110781374A (en) * 2018-07-13 2020-02-11 北京字节跳动网络技术有限公司 User data matching method and device, electronic equipment and computer readable medium
CN110019916A (en) * 2018-08-17 2019-07-16 平安普惠企业管理有限公司 Event-handling method, device, equipment and storage medium based on user's portrait
CN111178925B (en) * 2018-11-09 2023-07-25 百度在线网络技术(北京)有限公司 Method, apparatus, server and computer readable medium for predicting attribute of user portrait
CN111178925A (en) * 2018-11-09 2020-05-19 百度在线网络技术(北京)有限公司 User portrait attribute prediction method, device, server and computer readable medium
CN112330351B (en) * 2020-02-28 2023-09-26 北京京东振世信息技术有限公司 Method for selecting address, address selecting system and electronic equipment
CN112330351A (en) * 2020-02-28 2021-02-05 北京京东振世信息技术有限公司 Method for selecting address, address selecting system and electronic equipment
CN111400567A (en) * 2020-03-11 2020-07-10 北京古杉数据科技有限公司 AI-based user data processing method, device and system
CN111400567B (en) * 2020-03-11 2023-06-27 北京古杉数据科技有限公司 AI-based user data processing method, device and system
CN111782611A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Prediction model modeling method, device, equipment and storage medium
CN111782611B (en) * 2020-06-30 2024-01-23 北京百度网讯科技有限公司 Prediction model modeling method, device, equipment and storage medium
CN112184388A (en) * 2020-10-12 2021-01-05 上海燕汐软件信息科技有限公司 Home decoration service ticket distribution method and device and storage medium
CN112184388B (en) * 2020-10-12 2024-02-13 上海燕汐软件信息科技有限公司 Home decoration service list distribution method, device and storage medium
CN113836195A (en) * 2021-09-02 2021-12-24 国家电网有限公司客户服务中心 Knowledge recommendation method and system based on user portrait
CN113836195B (en) * 2021-09-02 2023-09-01 国家电网有限公司客户服务中心 Knowledge recommendation method and system based on user portrait

Also Published As

Publication number Publication date
CN104933157A (en) 2015-09-23

Similar Documents

Publication Publication Date Title
WO2016206196A1 (en) Method and device for obtaining user attribute information, and server
US11762818B2 (en) Apparatus, systems, and methods for analyzing movements of target entities
CN106570722B (en) Intelligent recommendation system and intelligent recommendation method
US11490220B2 (en) System and method for accurately and efficiently generating ambient point-of-interest recommendations
JP6784308B2 (en) Programs that update facility characteristics, programs that profile facilities, computer systems, and how to update facility characteristics
Xiao et al. Inferring social ties between users with human location history
JP6759844B2 (en) Systems, methods, programs and equipment that associate images with facilities
US8612134B2 (en) Mining correlation between locations using location history
US20150112963A1 (en) Time and location based information search and discovery
US20100153292A1 (en) Making Friend and Location Recommendations Based on Location Similarities
JP5732441B2 (en) Information recommendation method, apparatus and program
CN104298719A (en) Method and system for conducting user category classification and advertisement putting based on social behavior
Zhu et al. SEM-PPA: A semantical pattern and preference-aware service mining method for personalized point of interest recommendation
JP6767952B2 (en) Estimator, estimation method and estimation program
US20160161274A1 (en) Determining top venues from aggregated user activity location data
Boukhechba et al. Online recognition of people's activities from raw GPS data: Semantic Trajectory Data Analysis
WO2015102805A1 (en) Point of interest tagging from social feeds
US9251168B1 (en) Determining information about a location based on travel related to the location
Tiwari et al. Mining popular places in a geo-spatial region based on GPS data using semantic information
CN110968766A (en) Tourist portrait and LBS data-based touring scheme recommendation algorithm
JP6664582B2 (en) Estimation device, estimation method and estimation program
RU2658876C1 (en) Wireless device sensor data processing method and server for the object vector creating connected with the physical position
Ashley-Dejo et al. A context-aware proactive recommender system for tourist
KR102551773B1 (en) Place recommendation method and system
Hanawa et al. Recommendation system for tourist attractions based on Wi-Fi packet sensor data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15896091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15896091

Country of ref document: EP

Kind code of ref document: A1