WO2017107416A1 - 一种基于大数据的领域交叉推荐方法及装置 - Google Patents

一种基于大数据的领域交叉推荐方法及装置 Download PDF

Info

Publication number
WO2017107416A1
WO2017107416A1 PCT/CN2016/086407 CN2016086407W WO2017107416A1 WO 2017107416 A1 WO2017107416 A1 WO 2017107416A1 CN 2016086407 W CN2016086407 W CN 2016086407W WO 2017107416 A1 WO2017107416 A1 WO 2017107416A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
search
record
brand
target user
Prior art date
Application number
PCT/CN2016/086407
Other languages
English (en)
French (fr)
Inventor
刘志强
沈志勇
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to US15/564,323 priority Critical patent/US10459996B2/en
Publication of WO2017107416A1 publication Critical patent/WO2017107416A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a field cross-referencing method and apparatus based on big data.
  • the technical problem to be solved by the present invention is to provide a field cross-referencing method and device based on big data, and to provide a more accurate recommendation service for users.
  • the topic is built based on the online input record and the offline behavior record of the user in the specific user set.
  • the users in the specific user set are users who have both online input records and offline behavior records;
  • the content recommendation of the offline behavior is performed to the target user.
  • the offline behavior record includes: an offline consumption record
  • the online input record includes: a search record using a search engine, and/or an input record in a specific application by an input method or a voice receiving device; the type of the specific application includes at least one of the following: chat software , search engines, social software, and online shopping client software.
  • the offline consumption record includes: an offline consumption theme and an offline consumer brand
  • the online modeling input and the offline behavior record based on the user in the specific user set respectively perform topic modeling, including:
  • the topic modeling is performed based on the offline consumption records of the users in the specific user set, and the probability P (brand
  • the topic modeling LDA is performed based on the online search records of the users in the specific user set, and the probability P (search term
  • the result of the theme modeling determines a transition probability of a topic input from each line to a topic of each offline behavior, including:
  • consumer theme) each user's corresponding P ( Retrieving the theme
  • the content recommendation of the offline behavior to the target user includes:
  • user) of the target user for each brand is determined according to the following formula:
  • A2 Perform brand recommendation to the target user according to P 0 (brand
  • user) of the target user for each brand is determined according to the following formula:
  • P 0 brand
  • P 0 consumer theme
  • user Determining the probability that the target user consumes each consumer theme according to the consumption record of the target user, Is a pointer summing all the consumer topics involved in the offline behavior record of the user in the particular user set, Is a pointer summing all the search topics in the search record of the target user, and P 0 (search subject
  • B2 Perform brand recommendation to the target user according to P 0 (brand
  • the invention also provides a field cross recommendation device based on big data, comprising:
  • a modeling module for performing topic modeling separately based on online input records and offline behavior records of users in a specific user set; users in the specific user set have both online input records and offline behavior records User;
  • a calculation module configured to determine a transition probability of a topic input from each line to a topic of each offline behavior according to a result of the topic modeling
  • a recommendation module configured, for any target user, to perform content recommendation of the offline behavior to the target user based on the transition probability and the online input record of the target user.
  • the offline behavior record includes: an offline consumption record
  • the online input record includes: a search record using a search engine, and/or an input record in a specific application by an input method or a voice receiving device; the type of the specific application includes at least one of the following: chat software , search engines, social software, and online shopping client software.
  • the offline consumption record includes: an offline consumption theme and an offline consumer brand
  • the modeling module is configured to:
  • the topic modeling is performed based on the offline consumption records of the users in the specific user set, and the probability P (brand
  • the topic modeling LDA is performed based on the online search records of the users in the specific user set, and the probability P (search term
  • calculation module is specifically configured to:
  • the recommendation module is specifically configured to:
  • user) of the target user for each brand is determined according to the following formula:
  • Brand recommendation is made to the target user according to P 0 (brand
  • the recommendation module is specifically configured to:
  • user) of the target user for each brand is determined according to the following formula:
  • P 0 brand
  • P 0 consumer theme
  • user Determining the probability that the target user consumes each consumer theme according to the consumption record of the target user, Is a pointer summing all the consumer topics involved in the offline behavior record of the user in the particular user set, Is a pointer summing all the search topics in the search record of the target user, and P 0 (search subject
  • B2 Perform brand recommendation to the target user according to P 0 (brand
  • the present invention has at least the following advantages:
  • the big data-based field cross-referencing method and device of the present invention obtains the correspondence between user behavior characteristics between domains by docking and cross-analysing users in different fields, such as online input and offline behavior, according to the establishment
  • the corresponding relationship is recommended to the user.
  • Applying the technical solution of the present invention to the accurate recommendation of the consumer brand of the Internet + retail field and the precise positioning of the potential brand of the consumer brand can realize the cross-drainage of the multi-domain users, the precise marketing of the user, and the precise positioning of the potential customers.
  • the problem, and the effect is very obvious, from the offline simulation test and the online real consumption test, the accuracy of brand recommendation and user positioning is greatly improved, and the Gross Merchandise Volume (Gross Merchandise Volume) is also available. There is a big improvement.
  • FIG. 1 is a flowchart of a method for cross-referencing a domain based on big data according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a method for cross-referencing a domain based on big data according to a second embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a domain-based cross-referencing device based on a big data according to a third embodiment of the present invention.
  • FIG. 4 is a schematic diagram of data connection between a shopping mall consumption data and a Baidu search data for a specific user set according to a fifth embodiment of the present invention.
  • FIG. 5 is a schematic diagram of modeling shopping mall consumption data of a user in a shopping mall member set A' according to a fifth embodiment of the present invention.
  • FIG. 6 is a schematic diagram of modeling Baidu search data of a user in a mall member set A' according to a fifth embodiment of the present invention.
  • FIG. 7 is a fifth embodiment of the present invention for determining a search subject and a consumer by applying a two-layer probability map model Schematic diagram of probability transfer probability between fee topics;
  • FIG. 8 is a schematic diagram of a correspondence matrix of a search subject and a consumer theme according to a fifth embodiment of the present invention.
  • Figure 9 is a first search diagram of a fifth embodiment of the present invention.
  • Figure 10 is a second search diagram of a fifth embodiment of the present invention.
  • FIG. 11 is a block diagram showing the structure of a computer system suitable for implementing a big data based domain cross recommendation method of an embodiment of the present invention.
  • a first embodiment of the present invention includes the following specific steps:
  • Step S101 conducting research on a specific user set
  • the users in the specific user set are all users who have both online input records and offline behavior records, based on the online input records and lines of the users in the specific user set.
  • the following behavior records are separately modeled;
  • the offline behavior record includes: an offline consumption record
  • the online input record includes: a search record using a search engine, and/or an input record in a specific application by an input method or a voice receiving device; the type of the specific application includes at least one of the following: chat software , search engines, social software, and online shopping client software.
  • the offline consumption record includes: an offline consumption theme and an offline consumer brand
  • step S101 in the case where the search record recorded as using the search engine is input on the line, the theme modeling is performed based on the online input record and the offline behavior record of the user in the specific user set, respectively.
  • the topic modeling is performed based on the offline consumption records of the users in the specific user set, and the probability P (brand
  • the topic modeling LDA is performed based on the online search records of the users in the specific user set, and the probability P (search term
  • Step S102 determining, according to the result of the topic modeling, a transition probability of a topic input from each line to a topic of each offline behavior
  • step S102 includes:
  • Step S103 for any target user having an online input record, recommending content of the offline behavior to the target user based on the online input record of the target user and the transition probability.
  • step S103 includes:
  • user) of the target user for each brand is determined according to the following formula:
  • A2 Perform brand recommendation to the target user according to P 0 (brand
  • the second embodiment of the present invention is a domain-based cross-referencing method based on big data.
  • the method in this embodiment is substantially the same as the first embodiment, that is, steps S201-S202 are the same as steps S101-S102 of the first embodiment, and the difference is the same. Therefore, as shown in FIG. 2, step S203 of the method of this embodiment includes the following specific contents:
  • the target user is based on the online input record of the target user and the transition probability to the target user.
  • Content recommendations for offline behavior including:
  • B1 a target for any of the user, determined according to the following formula for the probability of the user spending for online brand P 0 (brand
  • P 0 brand
  • P 0 consumer theme
  • P 0 consumer theme
  • P 0 search subject
  • B2 Perform brand recommendation to the target user according to P 0 (brand
  • the third embodiment of the present invention corresponds to the first embodiment.
  • This embodiment introduces a domain-based cross-referencing device based on big data. As shown in FIG. 3, the following components are included:
  • a modeling module 301 configured to perform topic modeling separately based on online input records and offline behavior records of users in the specific user set; users in the specific user set both have online input records And users of offline behavior records;
  • the offline behavior record includes: an offline consumption record
  • the online input record includes: a search record using a search engine, and/or an input record in a specific application by an input method or a voice receiving device; the type of the specific application includes at least one of the following: chat software , search engines, social software, and online shopping client software.
  • the offline consumption record includes: an offline consumption theme and an offline consumer brand
  • the modeling module is configured to:
  • the topic modeling is performed based on the offline consumption records of the users in the specific user set, and the probability P (brand
  • the topic modeling is obtained, and the probability P (search term
  • a calculation module 302 configured to determine, according to the result of the topic modeling, a transition probability of a topic input from each line to a topic of each offline behavior
  • the calculation module 302 is configured to:
  • the recommendation module 303 is configured to perform content recommendation of the offline behavior to the target user based on the online input record of the target user and the transition probability for any target user having an online input record.
  • the recommendation module 303 is configured to:
  • user) of the target user for each brand is determined according to the following formula:
  • Brand recommendation is made to the target user according to P 0 (brand
  • the fourth embodiment of the present invention is a domain-based cross-referencing device based on big data.
  • the device in this embodiment is substantially the same as the third embodiment.
  • the difference is that the recommendation module 303 is specifically configured to:
  • the target user is based on the online input record of the target user and the transition probability to the target user.
  • Content recommendations for offline behavior including:
  • B1 a target for any of the user, determined according to the following formula for the probability of the user spending for online brand P 0 (brand
  • P 0 brand
  • P 0 consumer theme
  • P 0 consumer theme
  • P 0 search subject
  • B2 Perform brand recommendation to the target user according to P 0 (brand
  • the present embodiment is based on the foregoing embodiment, and the brand recommendation is performed based on the conversion situation of the Baidu search content to the shopping mall consumption, and an application example of the present invention is introduced with reference to FIGS. 4-10. .
  • the main idea of the embodiment of the present invention is to open and cross-model the data of the user in the field 1 (the shopping mall consumption) and the data in the domain 2 (the Baidu search engine retrieval), and the modeling process is as follows:
  • Step 1 performing theme modeling analysis on the user's shopping mall consumption data, obtaining the clustering feature (consumer theme) information of the brand and the consumption weight of the user on different clusters;
  • Step 2 performing subject modeling analysis on the user's Baidu search data, obtaining clustering feature (search subject) information of the keyword and distribution weight of the user on different clusters;
  • Step 3 It is assumed that there is a probability transfer relationship between each search subject and each consumption theme, and the user converts the search subject into a consumer theme through the probability transfer relationship, thereby performing brand consumption on different consumption themes. Based on the above assumptions, using the results obtained in steps 1 and 2, the correspondence between the search subject and the consumer theme is reversed.
  • Step 4 After obtaining the corresponding relationship, a more accurate recommendation service may be separately performed for the existing user and the new user, and the positioning of the potential target customer of the brand may be more accurately achieved for a given brand.
  • the user object of the research that is, the specific user set
  • the data is connected to the shopping mall consumption data and the Baidu search data in the specific user concentration.
  • the left side A represents the consumption data of the mall member
  • the right side A' represents the retrieval data corresponding to the mall member in Baidu
  • the right side B represents the retrieval data of the target user of the shopping mall in Baidu.
  • the shopping mall member set A' having both offline consumption data and online retrieval data is selected as the research user object.
  • the topic modeling LDA is performed based on the online retrieval data and the offline consumption data of the users in the mall member set A', and the modeling process is as follows:
  • LDA modeling is performed for the shopping mall consumption data of the users in the mall member set A', and according to the historical consumption brand of each user, the brand clustering feature P (brand
  • the brand clustering feature is the probability of consuming each brand in each consumer theme.
  • the distribution type of consumption of each user is the probability that each user consumes each consumer theme.
  • LDA modeling is performed on the Baidu search data of the user in the mall member set A', and the search term clustering feature P (search term
  • the search subject distribution P (retrieve subject
  • the search term clustering feature is the probability of inputting each search term in each search subject, and the search subject distribution of each user is the probability that each user searches for each search subject.
  • the probability transition probability between the search subject and the consumer theme is obtained by using the two-layer probability map model.
  • P consistumption theme
  • P 0 brand
  • P 0 consumer theme
  • user Means the probability that the online user consumes each consumer theme determined according to the online user's consumption record
  • the target users may be those who do not have real consumption history data, but the target user should at least have Baidu search data for determining P 0 (search subject
  • the program can make full use of the advantages of Baidu big data to generate greater performance improvement for other areas of business, so that big data can truly play its value, realize the real intelligent Internet +, and truly open up online and offline.
  • each row represents the probability distribution of 50 consumer themes corresponding to an online search topic, a total of 50 rows, indicating a total of 50 search topics.
  • M (26, 41) represents the correspondence between the search subject numbered 26 and the consumer subject numbered 41, and the corresponding relationship shown in FIG. 8 is strong, specifically:
  • M(26, 41) is 0.3, indicating that the user's consumption behavior of the first search term in the first search in FIG. 9 has a probability of 30% falling on the brands on the right. It is not difficult to find from Figure 9.
  • the search subject on the left is pregnant, infant, and child, and the consumer theme on the right is also pregnant, infant, and child, which has a good correspondence.
  • M(46,10) represents the correspondence between the search subject numbered 46 and the consumer subject numbered 10.
  • the figure shows that the correspondence is very strong, specifically:
  • M(46,10) is 0.2, indicating that the second user searches for the consumer behavior of the keyword on the left side of Fig. 10, and a 20% probability will fall on the brands on the right. It is not difficult to find from Figure 10.
  • the search subject on the left is makeup and skin care, and the theme of consumption on the right side is also makeup and skin care, which has a good correspondence.
  • FIG. 11 a block diagram of a computer system 1100 suitable for implementing a big data based domain cross recommendation method of an embodiment of the present invention is shown.
  • computer system 1100 includes a central processing unit (CPU) 1101 that can be loaded into a program in random access memory (RAM) 1103 according to a program stored in read only memory (ROM) 1102 or from storage portion 1108. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • RAM random access memory
  • ROM read only memory
  • various programs and data required for the operation of the system 1100 are also stored.
  • the CPU 1101, the ROM 1102, and the RAM 1103 are connected to each other through a bus 1104.
  • An input/output (I/O) interface 1105 is also coupled to bus 1104.
  • the following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, a mouse, etc.; an output portion 1107 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a speaker; a storage portion 1108 including a hard disk or the like And a communication portion 1109 including a network interface card such as a LAN card, a modem, or the like.
  • the communication section 1109 performs communication processing via a network such as the Internet.
  • Driver 1110 is also connected to I/O interface 1105 as needed.
  • a removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 1110 as needed so that a computer program read therefrom is installed into the storage portion 1108 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for performing the methods of FIGS. 1 and 2.
  • the computer program can be downloaded and installed from the network via the communication portion 1109, and/or installed from the removable medium 1111.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units or modules described in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit or module can also be provided in the processor.
  • the names of these units or modules do not in any way constitute a limitation on the unit or module itself.
  • the present application further provides a computer readable storage medium, which may be a computer readable storage medium included in the apparatus described in the foregoing embodiment, or may exist separately, not A computer readable storage medium that is assembled into the device.
  • the computer readable storage medium stores one or more programs that are used by one or more processors to perform the big data based domain cross recommendation method described in this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于大数据的领域交叉推荐方法及装置,上述方法包括:基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模,所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户(S101);根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率(S102);针对任一具备线上输入记录的目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐(S103)。上述方案能够实现多领域用户的交叉引流、用户精准营销、及潜在客户的精准定位等问题,提高了品牌推荐、用户定位的精准度,同时对线下零售GMV有较大的提升。

Description

一种基于大数据的领域交叉推荐方法及装置
相关申请的交叉引用
本申请要求于2015年12月23日提交的中国专利申请号为“201510979783.7”的优先权,其全部内容作为整体并入本申请中。
技术领域
本发明涉及互联网技术领域,尤其涉及一种基于大数据的领域交叉推荐方法及装置。
背景技术
现有的针对多领域用户行为的分析方法相对简单。通过在不同领域内寻找与给定目标用户相近或相似用户的方式,实现目标用户的定位及用户推荐内容的生成。
现有技术的缺点是:
可解释性较差,无法建立不同领域行为特征的对应关系;
人为干预过多,需要定义用户距离的度量方式与阈值;
可扩展性差,无法通过建模结果对新用户进行快速建模和推荐。
发明内容
本发明要解决的技术问题是,提供一种基于大数据的领域交叉推荐方法及装置,对用户进行更精准的推荐服务。
本发明采用的技术方案是,所述基于大数据的领域交叉推荐方法,包括:
基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建 模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
进一步的,所述线下行为记录,包括:线下消费记录;
所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件。
进一步的,所述线下消费记录,包括:线下消费主题和线下消费品牌;
在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模,包括:
基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
基于所述特定用户集里的用户的线上检索记录进行主题建模LDA,得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
进一步的,所述根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率,包括:
在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,或者,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
进一步的,作为一种可选的技术方案,针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐,包括:
A1:针对任一具备所述利用搜索引擎的检索记录的目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000001
其中,
Figure PCTCN2016086407-appb-000002
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000003
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
A2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
进一步的,作为另一种可选的技术方案,在任一所述目标用户还同时具备线下消费记录的情况下,针对所述目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐,包括:
B1:针对任一所述目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000004
其中,P0(品牌|消费主题)是指根据所述目标用户的消费记录确定出的所述目标用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述目标用户的消费记录确定出的所述目标用户对于各消费主题进行消费的概率,
Figure PCTCN2016086407-appb-000005
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000006
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于 各检索主题进行检索的概率;
B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
本发明还提供一种基于大数据的领域交叉推荐装置,包括:
建模模块,用于基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
计算模块,用于根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
推荐模块,用于针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
进一步的,所述线下行为记录,包括:线下消费记录;
所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件。
进一步的,所述线下消费记录,包括:线下消费主题和线下消费品牌;
在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述建模模块,用于:
基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
基于所述特定用户集里的用户的线上检索记录进行主题建模LDA,得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
进一步的,所述计算模块,具体用于:
在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于 各品牌的消费数据的基础上,或者,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
进一步的,作为一种可选的技术方案,所述推荐模块,具体用于:
针对任一具备所述利用搜索引擎的检索记录的目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000007
其中,
Figure PCTCN2016086407-appb-000008
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000009
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
根据P0(品牌|用户)向所述目标用户进行品牌推荐。
进一步的,作为另一种可选的技术方案,在任一所述目标用户还同时具备线下消费记录的情况下,所述推荐模块,具体用于:
B1:针对任一所述目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000010
其中,P0(品牌|消费主题)是指根据所述目标用户的消费记录确定出的所述目标用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述目标用户的消费记录确定出的所述目标用户对于各消费主题进行消费的概率,
Figure PCTCN2016086407-appb-000011
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000012
是指针对所述目标用户的检索记录中的所有检索主题求和, P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
采用上述技术方案,本发明至少具有下列优点:
本发明所述基于大数据的领域交叉推荐方法及装置,通过将用户在不同领域比如线上输入和线下行为进行对接和交叉分析,得到领域之间用户行为特征之间的对应关系,根据建立起的对应关系向用户进行内容推荐。将本发明的技术方案应用在互联网+零售领域的用户消费品牌精准推荐以及消费品牌潜在客户精准定位方面,能够实现多领域用户的交叉引流、用户精准营销、及潜在客户的精准定位等一系列的问题,且效果十分明显,从线下仿真测试以及线上真实消费测试中,都极大地提高了品牌推荐、用户定位的精准度,同时对线下零售GMV(Gross Merchandise Volume,商品交易总量)有较大的提升。
附图说明
图1为本发明第一实施例的基于大数据的领域交叉推荐方法流程图;
图2为本发明第二实施例的基于大数据的领域交叉推荐方法流程图;
图3为本发明第三实施例的基于大数据的领域交叉推荐装置组成结构示意图;
图4为本发明第五实施例的针对特定的用户集中的商场消费数据和百度检索数据进行数据打通对接的示意图;
图5为本发明第五实施例的针对商场会员集合A’中的用户的商场消费数据进行建模的示意图;
图6为本发明第五实施例的针对商场会员集合A’中的用户的百度检索数据进行建模的示意图;
图7为本发明第五实施例的通过运用双层概率图模型确定出检索主题与消 费主题之间的概率转移概率的示意图;
图8为本发明第五实施例的检索主题与消费主题的对应关系矩阵示意图;
图9为本发明第五实施例的第一检索图;
图10为本发明第五实施例的第二检索图;以及
图11示出了适于用来实现本发明实施例的基于大数据的领域交叉推荐方法的计算机系统的结构示意图。
具体实施方式
为更进一步阐述本发明为达成预定目的所采取的技术手段及功效,以下结合附图及较佳实施例,对本发明进行详细说明如后。
本发明第一实施例,一种基于大数据的领域交叉推荐方法,如图1所示,包括以下具体步骤:
步骤S101,针对特定用户集进行研究,所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户,基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;
具体的,在本实施例中,所述线下行为记录,包括:线下消费记录;
所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件等。
进一步的,所述线下消费记录,包括:线下消费主题和线下消费品牌;
在步骤S101中,在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模,包括:
基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
基于所述特定用户集里的用户的线上检索记录进行主题建模LDA,得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
步骤S102,根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
具体的,步骤S102,包括:
在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,或者,优选的,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
步骤S103,针对任一具备线上输入记录的目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐。
具体的,步骤S103,包括:
A1:针对任一具备所述利用搜索引擎的检索记录的目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000013
其中,
Figure PCTCN2016086407-appb-000014
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000015
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
A2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
进一步的,可以选择P0(品牌|用户)中消费概率最高的或者处于前几位的品牌向目标用户进行推荐。
本发明第二实施例,一种基于大数据的领域交叉推荐方法,本实施例所述方法与第一实施例大致相同,即步骤S201~S202与第一实施例的步骤S101~S102相同,区别在于,如图2所示,本实施例的所述方法的步骤S203包括以下具体内容:
在任一所述目标用户同时具备利用搜索引擎的检索记录以及线下消费记录的情况下,针对所述目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐,包括:
B1:针对任一所述目标用户,按照下面的公式确定出该线上用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000016
其中,P0(品牌|消费主题)是指根据所述线上用户的消费记录确定出的所述线上用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述线上用户的消费记录确定出的所述线上用户对于各消费主题进行消费的概率,
Figure PCTCN2016086407-appb-000017
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000018
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
本发明第三实施例,与第一实施例对应,本实施例介绍一种基于大数据的领域交叉推荐装置,如图3所示,包括以下组成部分:
1)建模模块301,用于基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
具体的,在本实施例中,所述线下行为记录,包括:线下消费记录;
所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件等。
进一步的,所述线下消费记录,包括:线下消费主题和线下消费品牌;
在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述建模模块,用于:
基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
基于所述特定用户集里的用户的线上检索记录进行主题建模(Latent Dirichlet Allocation,简称LDA),得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
2)计算模块302,用于根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
具体的,计算模块302用于:
在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,或者,优选的,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
3)推荐模块303,用于针对任一具备线上输入记录的目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐。
具体的,推荐模块303用于:
针对任一具备所述利用搜索引擎的检索记录的目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000019
其中,
Figure PCTCN2016086407-appb-000020
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000021
是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
根据P0(品牌|用户)向所述目标用户进行品牌推荐。比如:可以选择P0(品牌|用户)中消费概率最高的或者处于前几位的品牌向目标用户进行推荐。
本发明第四实施例,一种基于大数据的领域交叉推荐装置,本实施例所述装置与第三实施例大致相同,区别在于,推荐模块303具体用于:
在任一所述目标用户同时具备利用搜索引擎的检索记录以及线下消费记录的情况下,针对所述目标用户,基于所述目标用户的线上输入记录以及所述转移概率,向所述目标用户进行线下行为的内容推荐,包括:
B1:针对任一所述目标用户,按照下面的公式确定出该线上用户对于各品牌进行消费的概率P0(品牌|用户):
Figure PCTCN2016086407-appb-000022
其中,P0(品牌|消费主题)是指根据所述线上用户的消费记录确定出的所述线上用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述线上用户的消费记录确定出的所述线上用户对于各消费主题进行消费的概率,
Figure PCTCN2016086407-appb-000023
是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
Figure PCTCN2016086407-appb-000024
是指针对所述目标用户的检索记录中的所有检索主题求和, P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
本发明第五实施例,本实施例是在上述实施例的基础上,以基于百度检索内容向商场消费的转化情况来进行品牌推荐为例,结合附图4~10介绍一个本发明的应用实例。
本发明实施例的主要思路是:通过将用户在领域1(商场消费)的数据与用户在领域2(百度搜索引擎检索)的数据进行打通与交叉建模,建模过程如下:
步骤1,对用户的商场消费数据进行主题建模分析,得到品牌的聚类特征(消费主题)信息以及用户在不同聚类上的消费权重;
步骤2,对用户的百度检索数据进行主题建模分析,得到关键词的聚类特征(检索主题)信息以及用户在不同聚类上的分布权重;
步骤3,假设每一个检索主题与每一个消费主题之间有一个概率转移关系,用户通过该概率转移关系将检索主题转化为消费主题,进而在不同的消费主题进行品牌消费。根据上述假设,使用步骤1、2中得到的结果,反推出检索主题与消费主题的对应关系。
步骤4,在得到该对应关系之后,可以分别针对现有用户与新用户进行更精准的推荐服务,以及针对给定品牌的情况下,更精准地实现品牌潜在目标客户的定位。
下面基于上述解决问题的思路,详细介绍一下是如果基于百度检索内容向商场消费的转化情况来进行品牌推荐的过程。
第一阶段,确定研究的用户对象,即特定的用户集,针对该特定的用户集中的商场消费数据和百度检索数据进行数据打通对接。
如图4所示,左边A表示商场会员的消费数据,右边A’表示商场会员对应的在百度的检索数据,右边的B表示商场的目标用户在百度的检索数据。可见, 本实施例先选取同时具备线下消费数据和线上检索数据的商场会员集合A’作为研究的用户对象。
第二阶段,基于该商场会员集合A’中的用户的线上检索数据和线下消费数据分别进行主题建模LDA,建模过程如下:
如图5所示,针对该商场会员集合A’中的用户的商场消费数据,进行LDA建模,根据每个用户的历史消费品牌,得到了品牌聚类特征P(品牌|消费主题)与每位用户的消费类型分布P(消费主题|用户)。品牌聚类特征即在各消费主题中对于各品牌进行消费的概率,每位用户的消费类型分布即每位用户对于各消费主题进行消费的概率。
如图6所示,针对该商场会员集合A’中的用户的百度检索数据,进行LDA建模,根据每个用户的检索词,得到了检索词聚类特征P(检索词|检索主题)与每位用户的检索主题分布P(检索主题|用户)。检索词聚类特征即各检索主题中输入各检索词的概率,每位用户的检索主题分布即每位用户对于各检索主题进行检索的概率。
第三阶段,根据建模结果进行推荐。
如图7所示,在前面得到的品牌聚类特征、用户的检索主题分布以及用户的实际品牌消费历史的基础上,通过运用双层概率图模型得到检索主题与消费主题之间的概率转移概率P(消费主题|检索主题)。
针对任一目标用户,根据如下公式生成用户的推荐内容:
Figure PCTCN2016086407-appb-000025
其中,P0(品牌|消费主题)是指根据所述线上用户的消费记录确定出的所述线上用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述线上用户的消费记录确定出的所述线上用户对于各消费主题进行消费的概率;
目标用户可以是那些没有真实消费历史数据的用户,但该目标用户至少应具备百度的检索数据,用于确定出P0(检索主题|用户)。如果没有真实消费历史数据, 则只需要将上述公式中的P0(消费主题|用户)置为0即可。
下面简要的介绍一下本实施例的技术效果。
首先,该方案可以充分利用百度大数据的优势对其他领域的业务产生较大的业绩提升,让大数据真正发挥价值,实现真正的智能互联网+,与线上线下的真正打通。
其次,结合线下消费数据,可以帮助百度更好地理解到访用户,实现更精准、全面的用户刻画。形成数据闭环,帮助互联网广告更精准的投放。
以下分别充线下仿真测试及线上真实测试来说明使用百度大数据之后,对线下零售效率的提升效果:
以线下某零售商业地产为例,通过将其用户的消费数据与在百度的检索数据进行打通,按照本实施例的建模方法得到检索主题与消费主题的对应关系矩阵M,如图8所示,每一行表示一个线上检索主题对应的50个消费主题的概率分布,一共50行,表示一共有50个检索主题。
图8中的数值越大,表示这种对应关系越强,以该矩阵中的M(26,41)与M(46,10)为例,来说明这种对应关系的合理性。
M(26,41)表示标号为26的检索主题与标号为41的消费主题的对应关系,图8中显示这个对应关系很强,具体为:
矩阵中,M(26,41)为0.3,表示第一检索图9中左边检索词的用户的消费行为,有30%的概率会落在右边的这些品牌上。由图9不难发现,左边的检索主题为孕、婴、童,而右边的消费主题也为孕、婴、童,有很好的对应关系。
M(46,10)表示标号为46的检索主题与标号为10的消费主题的对应关系,图中显示这个对应关系很强,具体为:
矩阵中,M(46,10)为0.2,表示第二检索图10左边关键词的用户的消费行为,有20%的概率会落在右边的这些品牌上。由图10不难发现,左边的检索主题为化妆、护肤,而右边的消费主题也为化妆、护肤,有很好的对应关系。
另外,在线下仿真实验中,针对某一商场使用上述建模结果,相对于现有 技术中纯用线下模型向商场会员进行的品牌推荐准确率6.1%来说有交大提升,在加入百度检索数据后,品牌推荐准确率提升到11.1%。
在线上真实实验中,对某商场6.1儿童节的促销活动信息,寻找潜在的目标客户进行定向推送,通过考察这些目标用户的到场消费率来衡量技术效果,相对于现有技术中仅是基于会员消费历史向商场会员进行的品牌推荐的准确率7.49%来说也有交大的提升,加上百度检索数据之后,用户的到场消费率提升到11.6%,提升幅度54.8%。
综上所述,通过在商场消费数据中,加入百度检索数据之后,对商场会员的消费推荐准确率以及品牌推广过程中潜在客户的定位精度有了明显的提升。间接作用到商场的GMV提升。
下面参考图11,其示出了适于用来实现本发明实施例的基于大数据的领域交叉推荐方法的计算机系统1100的结构示意图。
如图11所示,计算机系统1100包括中央处理单元(CPU)1101,其可以根据存储在只读存储器(ROM)1102中的程序或者从存储部分1108加载到随机访问存储器(RAM)1103中的程序而执行各种适当的动作和处理。在RAM 1103中,还存储有系统1100操作所需的各种程序和数据。CPU 1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(I/O)接口1105也连接至总线1104。
以下部件连接至I/O接口1105:包括键盘、鼠标等的输入部分1106;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1107;包括硬盘等的存储部分1108;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1109。通信部分1109经由诸如因特网的网络执行通信处理。驱动器1110也根据需要连接至I/O接口1105。可拆卸介质1111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1110上,以便于从其上读出的计算机程序根据需要被安装入存储部分1108。
特别地,根据本公开的实施例,上文参考图1和图2描述的过程可以被实 现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包含用于执行图1和图2的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1109从网络上被下载和安装,和/或从可拆卸介质1111被安装。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中。其中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本申请的基于大数据的领域交叉推荐方法。
通过具体实施方式的说明,应当可对本发明为达成预定目的所采取的技术手段及功效得以更加深入且具体的了解,然而所附图示仅是提供参考与说明之用,并非用来对本发明加以限制。

Claims (14)

  1. 一种基于大数据的领域交叉推荐方法,其特征在于,包括:
    基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
    根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
    针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
  2. 根据权利要求1所述的基于大数据的领域交叉推荐方法,其特征在于,所述线下行为记录,包括:线下消费记录;
    所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件。
  3. 根据权利要求2所述的基于大数据的领域交叉推荐方法,其特征在于,所述线下消费记录,包括:线下消费主题和线下消费品牌;
    在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模,包括:
    基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
    基于所述特定用户集里的用户的线上检索记录进行主题建模,得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
  4. 根据权利要求3所述的基于大数据的领域交叉推荐方法,其特征在于, 所述根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率,包括:
    在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,或者,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
  5. 根据权利要求4所述的基于大数据的领域交叉推荐方法,其特征在于,针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐,包括:
    A1:针对任一目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
    Figure PCTCN2016086407-appb-100001
    其中,
    Figure PCTCN2016086407-appb-100002
    是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
    Figure PCTCN2016086407-appb-100003
    是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
    A2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
  6. 根据权利要求4所述的基于大数据的领域交叉推荐方法,其特征在于,在任一所述目标用户还同时具备线下消费记录的情况下,针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐,包括:
    B1:针对任一所述目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
    Figure PCTCN2016086407-appb-100004
    其中,P0(品牌|消费主题)是指根据所述目标用户的消费记录确定出的所述目标用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述目标用户的消费记录确定出的所述目标用户对于各消费主题进行消费的概率,
    Figure PCTCN2016086407-appb-100005
    是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
    Figure PCTCN2016086407-appb-100006
    是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
    B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
  7. 一种基于大数据的领域交叉推荐装置,其特征在于,包括:
    建模模块,用于基于所述特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
    计算模块,用于根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
    推荐模块,用于针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
  8. 根据权利要求7所述的基于大数据的领域交叉推荐装置,其特征在于,所述线下行为记录,包括:线下消费记录;
    所述线上输入记录,包括:利用搜索引擎的检索记录,和/或,通过输入法或者语音接收设备在特定应用中的输入记录;所述特定应用的类型,至少包括以下之一:聊天软件、搜索引擎、社交软件以及网上购物客户端软件。
  9. 根据权利要求8所述的基于大数据的领域交叉推荐装置,其特征在于,所述线下消费记录,包括:线下消费主题和线下消费品牌;
    在所述线上输入记录为利用搜索引擎的检索记录的情况下,所述建模模块,用于:
    基于所述特定用户集里的用户的线下消费记录进行主题建模,得到在各消费主题中对于各品牌进行消费的概率P(品牌|消费主题)、以及每位用户对于各消费主题进行消费的概率P(消费主题|用户);
    基于所述特定用户集里的用户的线上检索记录进行主题建模,得到在各检索主题中输入各检索词的概率P(检索词|检索主题)、以及每位用户对于各检索主题进行检索的概率P(检索主题|用户)。
  10. 根据权利要求9所述的基于大数据的领域交叉推荐装置,其特征在于,所述计算模块,具体用于:
    在每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,或者,在P(品牌|消费主题)、每位用户对应的P(检索主题|用户)以及所述特定用户集里的用户对于各品牌的消费数据的基础上,通过运用双层概率图模型得到从各检索主题到各消费主题的转移概率P(消费主题|检索主题)。
  11. 根据权利要求10所述的基于大数据的领域交叉推荐装置,其特征在于,所述推荐模块,具体用于:
    针对任一所述目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
    Figure PCTCN2016086407-appb-100007
    其中,
    Figure PCTCN2016086407-appb-100008
    是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
    Figure PCTCN2016086407-appb-100009
    是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
    根据P0(品牌|用户)向所述目标用户进行品牌推荐。
  12. 根据权利要求10所述的基于大数据的领域交叉推荐装置,其特征在于,在任一所述目标用户还同时具备线下消费记录的情况下,所述推荐模块,具体用于:
    B1:针对任一所述目标用户,按照下面的公式确定出所述目标用户对于各品牌进行消费的概率P0(品牌|用户):
    Figure PCTCN2016086407-appb-100010
    其中,P0(品牌|消费主题)是指根据所述目标用户的消费记录确定出的所述目标用户在各消费主题中对于各品牌进行消费的概率,P0(消费主题|用户)是指根据所述目标用户的消费记录确定出的所述目标用户对于各消费主题进行消费的概率,
    Figure PCTCN2016086407-appb-100011
    是指针对所述特定用户集里的用户的线下行为记录中所涉及的所有消费主题求和,
    Figure PCTCN2016086407-appb-100012
    是指针对所述目标用户的检索记录中的所有检索主题求和,P0(检索主题|用户)是指根据所述目标用户的检索记录确定出的所述目标用户对于各检索主题进行检索的概率;
    B2:根据P0(品牌|用户)向所述目标用户进行品牌推荐。
  13. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器执行基于大数据的领域交叉推荐方法,所述方法包括:
    基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
    根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转 移概率;
    针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
  14. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器执行基于大数据的领域交叉推荐方法,所述方法包括:
    基于特定用户集里的用户的线上输入记录和线下行为记录分别进行主题建模;所述特定用户集里的用户均为同时具有线上输入记录和线下行为记录的用户;
    根据主题建模的结果确定出从各线上输入的主题到各线下行为的主题的转移概率;
    针对任一目标用户,基于所述转移概率以及所述目标用户的线上输入记录,向所述目标用户进行线下行为的内容推荐。
PCT/CN2016/086407 2015-12-23 2016-06-20 一种基于大数据的领域交叉推荐方法及装置 WO2017107416A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/564,323 US10459996B2 (en) 2015-12-23 2016-06-20 Big data based cross-domain recommendation method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510979783.7 2015-12-23
CN201510979783.7A CN105630946B (zh) 2015-12-23 2015-12-23 一种基于大数据的领域交叉推荐方法及装置

Publications (1)

Publication Number Publication Date
WO2017107416A1 true WO2017107416A1 (zh) 2017-06-29

Family

ID=56045879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086407 WO2017107416A1 (zh) 2015-12-23 2016-06-20 一种基于大数据的领域交叉推荐方法及装置

Country Status (3)

Country Link
US (1) US10459996B2 (zh)
CN (1) CN105630946B (zh)
WO (1) WO2017107416A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630946B (zh) * 2015-12-23 2019-03-19 百度在线网络技术(北京)有限公司 一种基于大数据的领域交叉推荐方法及装置
CN106649842A (zh) * 2016-12-30 2017-05-10 上海博泰悦臻电子设备制造有限公司 一种基于融合数据的交叉推荐方法、系统及一种车机
CN106817296B (zh) * 2017-01-12 2020-04-14 微梦创科网络科技(中国)有限公司 信息推荐的测试方法、装置以及电子设备
CN107679702A (zh) * 2017-09-08 2018-02-09 绵阳西真科技有限公司 一种企业大数据智能分析管理方法
US10762157B2 (en) 2018-02-09 2020-09-01 Quantcast Corporation Balancing on-side engagement
CN111178920B (zh) * 2018-11-09 2023-07-07 阿里巴巴(深圳)技术有限公司 商品对象信息推荐方法、装置及系统
CN111723231B (zh) * 2019-03-20 2023-10-17 北京百舸飞驰科技有限公司 一种题目预测方法和装置
CN110246007B (zh) * 2019-05-28 2021-11-19 中国联合网络通信集团有限公司 一种商品推荐方法及装置
CN110335091A (zh) * 2019-07-15 2019-10-15 浪潮软件股份有限公司 一种基于长尾效应的卷烟惊喜度推荐方法及系统
TWI763165B (zh) * 2020-12-09 2022-05-01 中華電信股份有限公司 預測購物網站的顧客的消費金額的電子裝置和方法
CN116629983B (zh) * 2023-07-24 2023-09-22 成都晓多科技有限公司 基于用户偏好的跨领域商品推荐方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005222390A (ja) * 2004-02-06 2005-08-18 Matsushita Electric Ind Co Ltd お薦め情報提供装置
US20110093361A1 (en) * 2009-10-20 2011-04-21 Lisa Morales Method and System for Online Shopping and Searching For Groups Of Items
CN102479366A (zh) * 2010-11-25 2012-05-30 阿里巴巴集团控股有限公司 一种商品推荐方法及系统
CN104281622A (zh) * 2013-07-11 2015-01-14 华为技术有限公司 一种社交媒体中的信息推荐方法和装置
CN104317945A (zh) * 2014-10-31 2015-01-28 亚信科技(南京)有限公司 一种基于搜索行为的电商网站商品推荐方法
CN105117418A (zh) * 2015-07-30 2015-12-02 百度在线网络技术(北京)有限公司 基于搜索的服务信息管理系统及方法
CN105630946A (zh) * 2015-12-23 2016-06-01 百度在线网络技术(北京)有限公司 一种基于大数据的领域交叉推荐方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10129211B2 (en) * 2011-09-15 2018-11-13 Stephan HEATH Methods and/or systems for an online and/or mobile privacy and/or security encryption technologies used in cloud computing with the combination of data mining and/or encryption of user's personal data and/or location data for marketing of internet posted promotions, social messaging or offers using multiple devices, browsers, operating systems, networks, fiber optic communications, multichannel platforms
CN105095279B (zh) * 2014-05-13 2019-05-03 深圳市腾讯计算机系统有限公司 文件推荐方法和装置
CN104680387A (zh) * 2015-02-27 2015-06-03 百度在线网络技术(北京)有限公司 信息展示方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005222390A (ja) * 2004-02-06 2005-08-18 Matsushita Electric Ind Co Ltd お薦め情報提供装置
US20110093361A1 (en) * 2009-10-20 2011-04-21 Lisa Morales Method and System for Online Shopping and Searching For Groups Of Items
CN102479366A (zh) * 2010-11-25 2012-05-30 阿里巴巴集团控股有限公司 一种商品推荐方法及系统
CN104281622A (zh) * 2013-07-11 2015-01-14 华为技术有限公司 一种社交媒体中的信息推荐方法和装置
CN104317945A (zh) * 2014-10-31 2015-01-28 亚信科技(南京)有限公司 一种基于搜索行为的电商网站商品推荐方法
CN105117418A (zh) * 2015-07-30 2015-12-02 百度在线网络技术(北京)有限公司 基于搜索的服务信息管理系统及方法
CN105630946A (zh) * 2015-12-23 2016-06-01 百度在线网络技术(北京)有限公司 一种基于大数据的领域交叉推荐方法及装置

Also Published As

Publication number Publication date
US20190050484A1 (en) 2019-02-14
US10459996B2 (en) 2019-10-29
CN105630946B (zh) 2019-03-19
CN105630946A (zh) 2016-06-01

Similar Documents

Publication Publication Date Title
WO2017107416A1 (zh) 一种基于大数据的领域交叉推荐方法及装置
Guo et al. Combining geographical and social influences with deep learning for personalized point-of-interest recommendation
US10841743B2 (en) Branching mobile-device to system-namespace identifier mappings
Liu et al. Delineating the effects of social media marketing activities on Generation Z travel behaviors
He et al. SocoTraveler: Travel-package recommendations leveraging social influence of different relationship types
Ge et al. Cost-aware collaborative filtering for travel tour recommendations
Lew et al. Using quantile regression to understand visitor spending
US20160314377A1 (en) Using Similarity for Grouping Fonts and Individuals for Recommendations
US20150161529A1 (en) Identifying Related Events for Event Ticket Network Systems
Hu et al. A graph embedding based model for fine-grained POI recommendation
Chen et al. Dynamic evolutionary clustering approach based on time weight and latent attributes for collaborative filtering recommendation
CN116738066B (zh) 乡村旅游服务推荐方法、装置、电子设备及存储介质
Pang et al. Efficient point-of-interest recommendation with hierarchical attention mechanism
Shen et al. Delineating the perceived functional regions of London from commuting flows
Lang et al. POI recommendation based on a multiple bipartite graph network model
Zhu et al. Optimum spatial scale of regional tourism cooperation based on spillover effects in tourism flows
CN103136309A (zh) 通过基于核的学习对社交强度进行建模
Meng et al. POI recommendation for occasional groups Based on hybrid graph neural networks
Zhang et al. A Markovian model of user adaptation with case study of a shared bicycle scheme
JP7454630B2 (ja) ラベル推奨モデルのトレーニング方法及び装置、ラベル取得方法及び装置
Meyer et al. A year in Madrid as described through the analysis of geotagged Twitter data
Xu et al. Tourist Attraction Recommendation Method and Data Management Based on Big Data Analysis
Malynov et al. Development of an AI recommender system to recommend concerts based on microservice architecture using collaborative and content-based filtering methods
Guo et al. Contextual Collaborative Filtering Recommendation Model Integrated with Drift Characteristics of User Interest
You et al. A hotel ranking model through online reviews with aspect-based sentiment analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877229

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877229

Country of ref document: EP

Kind code of ref document: A1