CN114741595A - Method and device for pushing information - Google Patents
Method and device for pushing information Download PDFInfo
- Publication number
- CN114741595A CN114741595A CN202210378006.7A CN202210378006A CN114741595A CN 114741595 A CN114741595 A CN 114741595A CN 202210378006 A CN202210378006 A CN 202210378006A CN 114741595 A CN114741595 A CN 114741595A
- Authority
- CN
- China
- Prior art keywords
- user
- aligned
- business
- data
- user identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
- G06Q30/0271—Personalized advertisement
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机技术领域,尤其涉及一种信息推送的方法和装置。The present invention relates to the field of computer technology, and in particular, to a method and device for pushing information.
背景技术Background technique
在当前商圈的营销场景(例如:广告投放、商品推荐等)中,营销主要关注现有客户复购和激活潜在新客两个方面。目前在进行潜在新客挖掘时,多是从商圈本地的长期和短期用户行为数据进行人群探查。In the current marketing scenarios of business circles (for example: advertisement placement, product recommendation, etc.), marketing mainly focuses on the repurchase of existing customers and the activation of potential new customers. At present, when mining potential new customers, crowd exploration is mostly carried out from local long-term and short-term user behavior data in the business district.
在实现本发明过程中,发明人发现现有技术中至少存在如下问题:In the process of realizing the present invention, the inventor found that there are at least the following problems in the prior art:
商圈运营方仅根据商圈本地的长期和短期用户行为数据进行人群探查,缺乏对商圈外客流人群情况、商圈内店铺的消费情况的详细了解;又因商业利益保护、个人数据隐私安全的原因,商圈无法有效借助其他外部机构的数据,因而数据维度单一,无法有效展开营销。客群画像中用户的消费特征受市场舆论影响,不同时间的特征取值往往存在较大差异。现有商圈洞察方法没有考虑用户数据的实时性特征,因此得到的客群画像不准确,无法满足商业分析中的实时动态查询。现有技术在基于已有到访用户匹配目标新客时,仅根据单一用户特征属性扩展相似人群的方式进行,得到的目标新客往往不准确,从而影响营销效果。The business district operator only conducts crowd exploration based on local long-term and short-term user behavior data in the business district, and lacks a detailed understanding of the customer flow outside the business district and the consumption of stores in the business district; The reason is that the business district cannot effectively use the data of other external agencies, so the data dimension is single and cannot effectively carry out marketing. The consumption characteristics of users in the customer group portraits are affected by market public opinion, and there are often large differences in the value of the characteristics at different times. The existing business circle insight methods do not consider the real-time characteristics of user data, so the obtained customer group portraits are inaccurate and cannot meet the real-time dynamic query in business analysis. When matching target new customers based on existing visiting users, the prior art only expands similar groups according to the characteristic attributes of a single user, and the obtained target new customers are often inaccurate, thereby affecting the marketing effect.
综上,目前商圈在进行潜在新客挖掘时,存在数据维度单一、客群画像不具备实时性、目标新客匹配不准确的情况,因此,降低了商圈的营销效果。To sum up, when mining potential new customers in the current business district, there are situations in which the data dimension is single, the customer group portrait is not real-time, and the matching of target new customers is inaccurate. Therefore, the marketing effect of the business district is reduced.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明实施例提供一种信息推送的方法和装置,能够集合多方数据来进行用户画像刻画和信息推送,弥补了商圈运营场景下用户数据特征维度单一的缺陷,可以更好、更实时、更准确地进行用户客群画像的刻画,更准确地确定商圈的潜在客户,便于提高信息推送成功率。In view of this, the embodiments of the present invention provide a method and device for information push, which can collect multi-party data to perform user portrait characterization and information push, make up for the defect of a single dimension of user data features in the business circle operation scenario, and can better, More real-time and more accurate depiction of user customer group portraits, more accurate identification of potential customers in the business district, and easy to improve the success rate of information push.
为实现上述目的,根据本发明实施例的一个方面,提供了一种信息推送的方法。To achieve the above object, according to an aspect of the embodiments of the present invention, a method for pushing information is provided.
一种信息推送的方法,包括:A method for pushing information, including:
获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;Obtaining the user identification set to be aligned, and performing identification alignment processing on the user identification set to be aligned and the business circle user identification set to obtain the aligned user identification;
根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;Acquire the movement trajectory data of the aligned user identifiers according to the business data, and determine the dense source of visiting users based on the movement trajectory data;
根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;According to the characteristic data of the visiting users, calculate the customer group portraits of the visiting users in each dense source;
将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送。The similarity between the customer group portrait and the historical user portrait in the business district is compared to determine a target user, and information is pushed to the target user.
可选地,所述标识对齐处理包括:Optionally, the logo alignment process includes:
取交集处理、取并集处理、左外连接处理或右外连接处理。Take intersection processing, take union processing, left outer join processing or right outer join processing.
可选地,将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理包括:Optionally, performing identification alignment processing on the to-be-aligned user identification set and the business circle user identification set includes:
将所述待对齐用户标识集发送给第三方仲裁机构,以使所述第三方仲裁机构根据所述待对齐用户标识集与商圈用户标识集进行标识对齐处理。The user identification set to be aligned is sent to a third-party arbitration institution, so that the third-party arbitration institution performs identification alignment processing according to the user identification set to be aligned and the business district user identification set.
可选地,在将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理之前,还包括:Optionally, before performing identification alignment processing on the user identification set to be aligned and the business circle user identification set, the method further includes:
获取预先与商圈约定的用户标识的处理规则,所述处理规则包括用户标识的数据格式和加密方式;Obtain the processing rules of the user IDs agreed with the business circle in advance, and the processing rules include the data format and encryption method of the user IDs;
使用所述处理规则对所述待对齐用户标识集进行处理,并将处理后的待对齐用户标识集作为所述待对齐用户标识集。The user identification set to be aligned is processed by using the processing rule, and the processed user identification set to be aligned is used as the user identification set to be aligned.
可选地,基于所述移动轨迹数据确定到访用户的密集来源地包括:Optionally, determining the dense sources of visiting users based on the movement trajectory data includes:
基于所述移动轨迹数据确定每个到访用户的居住地和工作地;Determine the residence and work place of each visiting user based on the movement trajectory data;
采用空间聚集算法进行居住地和工作地聚类,得到多个聚类区域;Use the spatial clustering algorithm to cluster residence and work, and obtain multiple clustered areas;
从所述多个聚类区域中选出到访用户在空间中分布最密集的指定个数的聚类区域作为到访用户的密集来源地。From the plurality of clustering regions, a specified number of clustering regions where the visiting users are most densely distributed in space are selected as the dense source places of the visiting users.
可选地,将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户包括:Optionally, compare the similarity between the customer group portraits and the historical user portraits of the business district to determine that the target users include:
分别对所述客群画像与商圈的历史用户画像中的结构化画像数据进行类别编码,并通过降维得到画像表征向量;Category coding is performed on the structured portrait data in the customer group portrait and the historical user portrait in the business district respectively, and the portrait representation vector is obtained through dimensionality reduction;
分别对所述客群画像与商圈的历史用户画像中的非结构化画像数据进行社交网络表征提取,得到网络表征向量;Extracting social network representations from the unstructured portrait data in the customer group portraits and the historical user portraits of the business district, respectively, to obtain a network representation vector;
将所述画像表征向量和所述网络表征向量拼接到一起,分别得到所述客群画像与所述商圈的历史用户画像的特征向量;splicing the portrait representation vector and the network representation vector together to obtain the feature vector of the customer group portrait and the historical user portrait of the business district respectively;
基于特征向量之间的相似度,进行所述客群画像与商圈的历史用户画像的相似度比较,并将相似度满足预设阈值的客群画像所对应的到访用户确定为目标用户。Based on the similarity between the feature vectors, the similarity between the customer group portrait and the historical user portrait in the business district is compared, and the visiting user corresponding to the customer group portrait whose similarity meets a preset threshold is determined as the target user.
可选地,对所述目标用户进行信息推送包括:Optionally, performing information push to the target user includes:
获取所述目标用户的第一密集来源地,并将所述第一密集来源地返回给商圈,以使所述商圈根据所述第一密集来源地进行信息推送。Acquire the first dense source of the target user, and return the first dense source to the business circle, so that the business circle pushes information according to the first dense source.
根据本发明实施例的另一方面,提供了一种信息推送的装置。According to another aspect of the embodiments of the present invention, an apparatus for pushing information is provided.
一种信息推送的装置,包括:A device for pushing information, comprising:
用户标识对齐模块,用于获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;a user identification aligning module, configured to obtain a user identification set to be aligned, and perform identification alignment processing on the user identification set to be aligned and the business circle user identification set to obtain an aligned user identification;
用户来源地确定模块,用于根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;a user source determination module, configured to obtain the movement trajectory data of the aligned user identifiers according to the business data, and determine the dense source of visiting users based on the movement trajectory data;
客群画像计算模块,用于根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;The customer group portrait calculation module is used to calculate the customer group portrait of the visiting users in each dense source according to the characteristic data of the visiting users;
目标用户确定模块,用于将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送。The target user determination module is used to compare the similarity between the customer group portrait and the historical user portrait of the business district to determine the target user, and push the information to the target user.
根据本发明实施例的又一方面,提供了一种信息推送的电子设备。According to another aspect of the embodiments of the present invention, an electronic device for pushing information is provided.
一种信息推送的电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明实施例所提供的信息推送的方法。An electronic device for information push, comprising: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors, so that all The one or more processors implement the information push method provided by the embodiment of the present invention.
根据本发明实施例的再一方面,提供了一种计算机可读介质。According to yet another aspect of the embodiments of the present invention, a computer-readable medium is provided.
一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现本发明实施例所提供的信息推送的方法。A computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements the information push method provided by the embodiment of the present invention.
上述发明中的一个实施例具有如下优点或有益效果:通过获取待对齐用户标识集,并将待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;根据业务数据获取对齐用户标识的移动轨迹数据,并基于移动轨迹数据确定到访用户的密集来源地;根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;将客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对目标用户进行信息推送,实现了可以集合多方数据来进行用户画像刻画和信息推送,弥补了商圈运营场景下用户数据特征维度单一的缺陷,实现了对商圈未到访用户来源地的充分挖掘,从而更好地在商圈新客引流场景下进行广告等信息推送;基于用户轨迹信息的时空属性,得到用户的精确密集来源地,从而可以更好、更实时、更准确地进行用户客群画像的刻画,更准确地确定商圈的潜在客户,便于提高信息推送成功率。One embodiment of the above invention has the following advantages or beneficial effects: by acquiring the user identification set to be aligned, and performing identification alignment processing on the user identification set to be aligned and the business circle user identification set, the aligned user identification is obtained; the alignment user identification is obtained according to business data; Identify the movement trajectory data, and determine the intensive source of visiting users based on the movement trajectory data; according to the characteristic data of the visiting users, calculate the customer group portraits of the visiting users in each dense source area; The similarity comparison of the historical user portraits in the circle to determine the target users, and push the information of the target users, realize the user portrait characterization and information push through the collection of multi-party data, and make up for the single dimension of user data characteristics in the business circle operation scenario. Defects, realize the full mining of the source of unvisited users in the business district, so as to better push advertisements and other information in the scenario of new customer drainage in the business district; based on the spatiotemporal attributes of user trajectory information, obtain the precise and dense source of users , so that it can better, more real-time and more accurately describe the user group portrait, more accurately determine the potential customers of the business district, and improve the success rate of information push.
上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。Further effects of the above non-conventional alternatives will be described below in conjunction with specific embodiments.
附图说明Description of drawings
附图用于更好地理解本发明,不构成对本发明的不当限定。其中:The accompanying drawings are used for better understanding of the present invention and do not constitute an improper limitation of the present invention. in:
图1是根据本发明实施例的信息推送的方法的主要步骤示意图;1 is a schematic diagram of main steps of a method for information push according to an embodiment of the present invention;
图2是本发明实施例的实现流程示意图;Fig. 2 is the implementation flow schematic diagram of the embodiment of the present invention;
图3是本发明实施例的客群画像构建与新客挖掘过程示意图;3 is a schematic diagram of a customer group portrait construction and a new customer mining process according to an embodiment of the present invention;
图4是根据本发明实施例的信息推送的装置的主要模块示意图;4 is a schematic diagram of main modules of an apparatus for pushing information according to an embodiment of the present invention;
图5是本发明实施例可以应用于其中的示例性系统架构图;5 is an exemplary system architecture diagram to which an embodiment of the present invention may be applied;
图6是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图对本发明的示范性实施例做出说明,其中包括本发明实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本发明的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.
在当前商圈的营销场景中,营销主要关注现有客户复购和激活潜在新客两个方面。但由于当方数据不仅数据与特征维度严重不足,并且始终局限于当前商圈范围内产生的数据。例如像商圈的运营公司希望了解到未到访过商圈的潜在客户的来源地聚集区,人群画像等,通过对目标新客来源地进行针对性地分析引流,为各业态吸引新客源。为了充分拓展商圈的营销辐射范围,便需要充分挖掘商圈之外的客流的人群特征数据,分析各种维度的数据特征、出行特征进行综合比对,锁定现有客流的聚焦点同时找到潜在的客源信息,为商业洞察新客洞察板块提供数据支撑。但是商圈的运营方并未掌握商圈外客户的人群特征数据及消费数据,甚至由于涉及商业机密商圈内商铺的具体消费数据也难以获得,因此对商圈外的目标新客挖掘引流和对应营销形成很大的阻力。如果商圈能接入线上客流数据,可以对目标新客分析为各业态吸引新客源,从而增加收益。In the current marketing scenario of business circles, marketing mainly focuses on the repurchase of existing customers and the activation of potential new customers. However, due to the fact that the local data is not only seriously insufficient in data and feature dimensions, but also always limited to the data generated within the current business district. For example, an operating company in a business district hopes to know the source of potential customers who have never visited the business district, the crowd portraits, etc., and through targeted analysis and drainage of the source of target new customers, to attract new customers for various formats. . In order to fully expand the marketing radiation scope of the business circle, it is necessary to fully mine the crowd characteristics data of the passenger flow outside the business circle, analyze the data characteristics and travel characteristics of various dimensions for a comprehensive comparison, lock the focus of the existing passenger flow and find potential The customer source information provided by the business insight new customer insight section provides data support. However, the operators of the business circle do not have the demographic data and consumption data of customers outside the business circle, and even the specific consumption data of the shops in the business circle are difficult to obtain due to the commercial secrets. Corresponding marketing has formed a great resistance. If the business district can access online passenger flow data, it can analyze the target new customers to attract new customers for various business formats, thereby increasing revenue.
当前商圈在进行新客挖掘时主要存在的缺陷如下:The main defects in the current business district in the process of mining new customers are as follows:
1、特征维度稀少:商圈运营方数据仅根据商圈本地的长期和短期用户行为数据进行人群探查,缺乏对商圈外客流人群情况、商圈内店铺的消费情况的详细了解;又因商业利益保护、个人数据隐私安全的原因,商圈无法有效借助其他外部机构的数据,因而数据维度单一,无法有效展开营销。1. Sparse feature dimensions: The data of the business district operators are only based on the local long-term and short-term user behavior data of the business district for crowd exploration, and they lack a detailed understanding of the passenger flow outside the business district and the consumption of stores in the business district; Due to the protection of interests and the privacy and security of personal data, the business district cannot effectively use the data of other external organizations, so the data dimension is single, and marketing cannot be effectively carried out.
2、客群画像不具备实时性:客群画像中用户的消费特征受市场舆论影响,不同时间的特征取值往往存在较大差异。现有商圈洞察方法没有考虑用户数据的实时性特征,因此得到的客群画像不准确,无法满足商业分析中的实时动态查询。2. Customer group portraits are not real-time: the consumption characteristics of users in the customer group portraits are affected by market public opinion, and there are often large differences in the value of the characteristics at different times. The existing business circle insight methods do not consider the real-time characteristics of user data, so the obtained customer group portraits are inaccurate and cannot meet the real-time dynamic query in business analysis.
3、目标新客匹配不准确:现有技术在基于已有到访用户匹配目标新客时,仅利用TF-IDF方法根据单一用户特征属性扩展相似人群的方式进行,得到的目标新客往往不准确,从而影响营销效果。3. Inaccurate matching of target new customers: when the existing technology matches target new customers based on existing visiting users, it only uses the TF-IDF method to expand similar groups according to the characteristic attributes of a single user, and the obtained target new customers are often inaccurate. Accurate, thereby affecting the marketing effect.
本发明为了解决上述现有技术中所存在的缺陷,主要采用了以下的技术手段:In order to solve the defect existing in the above-mentioned prior art, the present invention mainly adopts the following technical means:
1、解决商圈运营方数据维度不足的问题,通过结合一家或多家外部合作方的数据,在个人用户数据隐私得到保护、不泄露商业机密的情况下,安全实现多方跨域的数据统计查询,补全商圈内外用户的数据特征,发现目标新客聚集地通过广告投放、引流班车、业态优化等针对性引流方式,达到辅助商业营销拓展商圈辐射区域的目的;1. Solve the problem of insufficient data dimension of business operators. By combining the data of one or more external partners, under the condition that the privacy of individual user data is protected and business secrets are not disclosed, it can safely realize multi-party and cross-domain data statistics query , complement the data characteristics of users inside and outside the business circle, and find the target new customer gathering places to achieve the purpose of assisting commercial marketing to expand the radiation area of the business circle through targeted drainage methods such as advertising, bus drainage, and business format optimization;
2、基于用户轨迹信息的时空属性,通过不同用户长时驻留时间段,进行空间聚类,得到用户的精确来源地;结合用户多方特征中的动态特征,按照以小时、分钟为单位的短期数据和以日、周为单位的长期数据,构建到访用户和目标新客的客群画像,实时动态的反映客群特征,并根据到访用户实时推荐目标新客;商圈与社交平台合作,利用社交图网络,通过LINE的Graph Embedding方式,提取用户社交图的反映一阶、二阶相似度的表征作为特征,提高目标新客的匹配精度。2. Based on the spatiotemporal attributes of user trajectory information, spatial clustering is performed through different users' long-term residence time periods to obtain the precise source of users; Data and long-term data on a daily and weekly basis, build customer group portraits of visiting users and target new customers, reflect the characteristics of customer groups dynamically in real time, and recommend target new customers in real time according to visiting users; business circles cooperate with social platforms , using the social graph network, through the Graph Embedding method of LINE, to extract the first-order and second-order similarity representations of the user's social graph as features to improve the matching accuracy of target new customers.
在本发明的实施例介绍中,所涉及的技术术语及其释义如下。In the introduction of the embodiments of the present invention, the related technical terms and their definitions are as follows.
1、OPTICS算法:OPTICS算法全称是Ordering points to identify theclustering structure,目标是将空间中的数据按照密度分布进行聚类。其思想和DBSCAN非常类似,但是DBSCAN在输入参数的选取上比较困难,即DBSCAN对输入参数比较敏感。OPTICS和DBSCAN不同的是,使得基于密度的聚类结构能够呈现出一种特殊的顺序,该顺序所对应的聚类结构包含了每个层级的聚类的信息,并且便于分析,对于邻域半径的轻微变化,并不会影响聚类结果。OPTICS算法有诸多优点:1. OPTICS algorithm: The full name of OPTICS algorithm is Ordering points to identify the clustering structure, and the goal is to cluster the data in space according to the density distribution. Its idea is very similar to DBSCAN, but DBSCAN is difficult to select input parameters, that is, DBSCAN is more sensitive to input parameters. The difference between OPTICS and DBSCAN is that the density-based clustering structure can show a special order. The clustering structure corresponding to this order contains the information of each level of clustering and is easy to analyze. For the neighborhood radius A slight change in , does not affect the clustering results. The OPTICS algorithm has many advantages:
1)不需要事先指定聚类个数,且可以发现任意形状的聚类;1) There is no need to specify the number of clusters in advance, and clusters of arbitrary shapes can be found;
2)对异常点不敏感,在聚类过程中能自动识别出异常点;2) It is not sensitive to abnormal points, and can automatically identify abnormal points in the clustering process;
3)聚类结果不依赖于节点的遍历顺序;3) The clustering result does not depend on the traversal order of nodes;
4)对邻域半径参数不敏感,聚类结果更稳定。4) It is not sensitive to the neighborhood radius parameter, and the clustering result is more stable.
2、TF-IDF:TF-IDF(Term Frequency-inverse Document Frequency)是一种针对关键词的统计分析方法,用于评估一个词对一个文件集或者一个语料库的重要程度。一个词的重要程度跟它在文章中出现的次数成正比,跟它在语料库出现的次数成反比。这种计算方式能有效避免常用词对关键词的影响,提高了关键词与文章之间的相关性。其中TF指的是某词在文章中出现的总次数,该指标通常会被归一化定义为TF=(某词在文档中出现的次数/文档的总词量),这样可以防止结果偏向过长的文档(同一个词语在长文档里通常会具有比短文档更高的词频)。IDF逆向文档频率,包含某词语的文档越少,IDF值越大,说明该词语具有很强的区分能力,IDF=loge(语料库中文档总数/包含该词的文档数+1),+1的原因是避免分母为0。TF-IDF=TFxIDF,TF-IDF值越大表示该特征词对这个文本的重要性越大。2. TF-IDF: TF-IDF (Term Frequency-inverse Document Frequency) is a statistical analysis method for keywords, which is used to evaluate the importance of a word to a document set or a corpus. The importance of a word is proportional to the number of times it appears in the article and inversely proportional to the number of times it appears in the corpus. This calculation method can effectively avoid the influence of common words on keywords, and improve the correlation between keywords and articles. Among them, TF refers to the total number of times a word appears in the article. This indicator is usually defined as TF=(the number of times a word appears in the document/the total number of words in the document), which can prevent the result from being biased too much. Long documents (the same word in a long document usually has a higher word frequency than a short document). IDF reverses the document frequency. The fewer documents that contain a word, the greater the IDF value, indicating that the word has a strong ability to distinguish. IDF=loge (the total number of documents in the corpus / the number of documents containing the word + 1), +1 The reason is to avoid a denominator of 0. TF-IDF=TFxIDF, the larger the TF-IDF value, the greater the importance of the feature word to the text.
3、余弦相似度:余弦相似度(Cosine Similarity)通过计算两个向量的夹角余弦值来评估他们的相似度。将向量根据坐标值,绘制到向量空间中,求得他们的夹角,并得出夹角对应的余弦值,此余弦值就可以用来表征这两个向量的相似性。夹角越小,余弦值越接近于1,它们的方向越吻合,则越相似。3. Cosine Similarity: Cosine Similarity evaluates the similarity of two vectors by calculating the cosine value of the angle between them. Draw the vectors into the vector space according to the coordinate values, obtain their included angle, and obtain the cosine value corresponding to the included angle. This cosine value can be used to characterize the similarity of the two vectors. The smaller the included angle, the closer the cosine value is to 1, and the closer their directions are, the more similar they are.
4、LINE:LINE(Larg-scale Information Network Embedding)由Jian Tang等于2015年提出的,该方法提出了一种可以应用在任意边类型的大型网络上的节点嵌入算法,并通过考虑first-order proximity(local structure)和second-order proximity(global structure)实现网络嵌入。相比较之前的Graph/Network Embedding方法,LINE具有如下好处:1.适用于任意类型的网络,这里的任意类型在该文中主要是指边的权重和方向是任意:有向,无向,有权重,无权重。(LINE并没有考虑不同节点类型和边类型下的异构网络,有一定的局限性。也就有了后来针对heterogeneous和homogeneous网络的研究);2.LINE提出了一种边采样(edge-sampling)算法来提升和优化目标函数,从而克服了传统的随机梯度下降(stochastic gradient decent)的局限性。4. LINE: LINE (Larg-scale Information Network Embedding) was proposed by Jian Tang in 2015. This method proposes a node embedding algorithm that can be applied to large-scale networks of any edge type. By considering first-order proximity (local structure) and second-order proximity (global structure) for network embedding. Compared with the previous Graph/Network Embedding method, LINE has the following advantages: 1. It is suitable for any type of network. The arbitrary type here mainly refers to the weight and direction of the edge in this article: directed, undirected, and weighted , without weight. (LINE does not consider heterogeneous networks under different node types and edge types, which has certain limitations. There are subsequent studies on heterogeneous and homogeneous networks); 2.LINE proposes an edge-sampling (edge-sampling) ) algorithm to boost and optimize the objective function, thus overcoming the limitations of traditional stochastic gradient decent.
5、同态加密:同态加密(Homomorphic Encryption,HE)是一类具有特殊自然属性的加密方法,此概念是Rivest等人在20世纪70年代首先提出的,与一般加密算法相比,同态加密除了能实现基本的加密操作之外,还能实现密文间的多种计算功能,即先计算后解密可等价于先解密后计算。5. Homomorphic encryption: Homomorphic encryption (HE) is a type of encryption method with special natural properties. This concept was first proposed by Rivest et al. in the 1970s. Compared with general encryption algorithms, homomorphic encryption In addition to basic encryption operations, encryption can also implement various computing functions between ciphertexts, that is, computing after decryption can be equivalent to computing after decryption.
图1是根据本发明实施例的信息推送的方法的主要步骤示意图。在该实施例中,该信息推送的方法的执行主体为商圈的合作方。这里的合作方可以是手机APP、线上购物网站、线下购物平台、线上或线下旅游产品售卖平台等,主要取决于商圈方的业务需求以及合作方可以提供的实际信息。如图1所示,本发明实施例的信息推送的方法主要包括如下的步骤S101至步骤S104。FIG. 1 is a schematic diagram of main steps of a method for pushing information according to an embodiment of the present invention. In this embodiment, the execution subject of the method for pushing information is a partner of the business circle. The partners here can be mobile APPs, online shopping websites, offline shopping platforms, online or offline travel product sales platforms, etc., which mainly depend on the business needs of the business district parties and the actual information that the partners can provide. As shown in FIG. 1 , the method for pushing information according to the embodiment of the present invention mainly includes the following steps S101 to S104.
步骤S101:获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;Step S101: obtaining a user identification set to be aligned, and performing identification alignment processing on the user identification set to be aligned and the business circle user identification set to obtain an aligned user identification;
步骤S102:根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;Step S102: obtaining the movement track data of the aligned user identifiers according to the business data, and determining the dense source of visiting users based on the movement track data;
步骤S103:根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;Step S103: according to the characteristic data of the visiting users, respectively calculate the customer group portraits of the visiting users in each dense source;
步骤S104:将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送。Step S104: Compare the similarity between the customer group portrait and the historical user portrait in the business district to determine a target user, and push information to the target user.
根据本发明的一个实施例,所述标识对齐处理包括:取交集处理、取并集处理、左外连接处理或右外连接处理。标识(ID)对齐的方式并不固定,基于安全考量同样可以有多种方法,但核心是取交集。如果实际业务需要,也可以进行包括取并集,或者左外连接、右外连接等方式。According to an embodiment of the present invention, the identification alignment processing includes: intersection processing, union processing, left outer join processing or right outer join processing. The way of identification (ID) alignment is not fixed. There are also various methods based on security considerations, but the core is to take the intersection. If the actual business needs, it can also include the union, or the left outer join, the right outer join and so on.
根据本发明的另一个实施例,将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理包括:将所述待对齐用户标识集发送给第三方仲裁机构,以使所述第三方仲裁机构根据所述待对齐用户标识集与商圈用户标识集进行标识对齐处理。其中,第三方仲裁机构是独立第三方中立安全节点。但是在具体实施过程中,除了将用户标识汇总到第三方仲裁机构进行标识对齐之外,若商户方与商圈方相互安全信任,也可以由合作方将待对齐用户标识集直接发给商圈运营方,在商圈方完成标识对齐后,由商圈方再发给合作方,或者反之,主要取决于哪种方式满足业务以及安全合规要求。即:该第三方仲裁机构可以是与商户方和商圈方不同的第三方中立安全节点,也可以是满足业务以及安全合规要求的商户方和商圈方中的任一方。According to another embodiment of the present invention, performing identification alignment processing on the user identification set to be aligned and the business circle user identification set includes: sending the user identification set to be aligned to a third-party arbitration organization, so that the third party The arbitration institution performs identification alignment processing according to the user identification set to be aligned and the business district user identification set. Among them, the third-party arbitration institution is an independent third-party neutral security node. However, in the specific implementation process, in addition to summarizing the user IDs to a third-party arbitration institution for ID alignment, if the merchant and the business district trust each other safely, the partner can also send the user ID set to be aligned directly to the business district. For the operator, after the business circle side completes the logo alignment, the business circle side will send it to the partner, or vice versa, depending on which method meets the business and security compliance requirements. That is, the third-party arbitration institution can be a third-party neutral security node different from the merchant side and the business circle side, or it can be either the merchant side or the business circle side that meets the business and security compliance requirements.
根据本发明的又一个实施例,在将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理之前,还包括:获取预先与商圈约定的用户标识的处理规则,所述处理规则包括用户标识的数据格式和加密方式;使用所述处理规则对所述待对齐用户标识集进行处理,并将处理后的待对齐用户标识集作为所述待对齐用户标识集。在进行标识对齐处理之前,需要对待对齐用户标识集与商圈用户标识集进行统一处理,具体地,商圈方和合作方规定好统一的用户标识的数据格式,以便做标识ID对齐,这里的ID是唯一的身份标识即可。根据不同的营销场景,对齐的ID类型可以不同。此外,也要规定好统一的ID加密方式,ID不可以用明文传输,必须是密文,并统一好加密方式。加密方式可以有多种形式,MD5或哈希加密,或者多种加密方式组合,避免加密后的ID可以被反解密,从而更好地保证数据的安全性和隐私性。According to yet another embodiment of the present invention, before performing the identification alignment process on the user identification set to be aligned and the business circle user identification set, the method further includes: acquiring a processing rule for user identification agreed with the business circle in advance, the processing rule Including the data format and encryption method of the user identification; using the processing rule to process the user identification set to be aligned, and using the processed user identification set to be aligned as the user identification set to be aligned. Before performing the identification alignment process, the user identification set to be aligned and the business circle user identification set need to be uniformly processed. Specifically, the business circle party and the partner must specify a unified user identification data format for identification ID alignment. Here, the ID is a unique identification. Aligned ID types can be different according to different marketing scenarios. In addition, a unified ID encryption method should also be specified. The ID cannot be transmitted in plain text, but must be cipher text, and the encryption method should be unified. The encryption method can be in various forms, such as MD5 or hash encryption, or a combination of various encryption methods, so as to avoid the encrypted ID from being decrypted, so as to better ensure the security and privacy of the data.
在进行标识对齐之后,对于已对齐的用户标识,合作方根据自有业务数据(如:手机信令数据等)查询对齐用户过去一年的移动轨迹信息。其中,信令数据是移动运营商以固定频率采集移动端当前时间和当前位置的数据,在移动运营商进行安全处理并授权后可以获取。另外,假设合作方为电商平台,则可根据对齐用户的订单数据中的收货地址等查询用户的移动轨迹信息,等等。对于不同的业务场景,可以通过不同的方式获取到对齐用户的移动轨迹信息。After the identity alignment is performed, for the aligned user identity, the partner queries and aligns the user's movement track information in the past year according to its own business data (eg, mobile phone signaling data, etc.). Among them, the signaling data is the data collected by the mobile operator on the current time and current location of the mobile terminal at a fixed frequency, and can be obtained after the mobile operator performs security processing and authorization. In addition, assuming that the partner is an e-commerce platform, the user's movement track information can be queried according to the delivery address in the order data of the aligned user, and so on. For different business scenarios, the alignment user's movement trajectory information can be obtained in different ways.
根据本发明的又一个实施例,基于所述移动轨迹数据确定到访用户的密集来源地包括:基于所述移动轨迹数据确定每个到访用户的居住地和工作地;采用空间聚集算法进行居住地和工作地聚类,得到多个聚类区域;从所述多个聚类区域中选出到访用户在空间中分布最密集的指定个数的聚类区域作为到访用户的密集来源地。在本发明的一个实施例中,假设通过手机信令数据获取到对齐用户的移动轨迹信息,首先,获得到对齐用户轨迹信息之后,合作方根据轨迹信息判断用户的居住地和工作地,其中判断逻辑为:用户在工作日的9点-17点的高频驻留点为工作地,用户在晚10-早8点的高频驻留点为居住地(驻留点:样本轨迹信息长时间停留的位置)。然后,完成所有用户居住地和工作地计算,采用空间聚集算法进行居住地和工作地聚类,得到聚类区域。最后从聚类结果中筛选出到访用户在空间中分布最密集的指定个数(例如10个)的聚类区域,并展示在地图中,即商圈到访用户的前十密集来源地。According to another embodiment of the present invention, determining the dense source of visiting users based on the movement trajectory data includes: determining the residence and work place of each visiting user based on the movement trajectory data; Clustering at the site and work place to obtain a plurality of clustering areas; from the plurality of clustering areas, select a specified number of clustering areas with the most dense distribution of visiting users in space as the dense source of visiting users . In an embodiment of the present invention, it is assumed that the alignment user's movement trajectory information is obtained through mobile phone signaling data. First, after obtaining the alignment user trajectory information, the partner judges the user's residence and work place according to the trajectory information. The logic is: the user's high-frequency residence point from 9:00 to 17:00 on weekdays is the work place, and the user's high-frequency residence point from 10:00 p.m. to 8:00 a.m. is the residence (dwelling point: sample trajectory information for a long time) the stop position). Then, complete the calculation of all the user's residence and work, and use the spatial aggregation algorithm to cluster the residence and work to obtain the clustering area. Finally, from the clustering results, a specified number (for example, 10) of the clustering areas with the most densely distributed visiting users in the space are screened out and displayed on the map, that is, the top ten densely sourced places of visiting users in the business district.
之后,在确定到访用户的密集来源地之后,将根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像。客群画像的维度包括:固定时间段的样本到访量、到访用户出现的时间分布情况(如:统计不同时段到访样本数量)、组成成分分析、社交图分析等,组成分析可以包括性别、年龄、有无子女、消费能力、购物偏好、等结构化特征;社交图分析主要利用社交软件的相关数据信息,将与已到访用户的亲友、同事关系通过利用GraphEmbedding中的LINE算法,提取出反映一阶、二阶相似度的表征,作为后期将到访用户的客群画像与目标新客客群画像匹配的依据。同时在结构化特征的处理中,针对用户标签的时间属性采取不同的处理:对于静态特征,将其表征存储在内容库中,以方便后期使用,对于动态特征,如用户偏好、兴趣等随时间经常发生变化的特征,可以日、周为单位计算长期表征向量,以时、分钟为单位计算短期表征向量,将短期表征向量与长期表征向量进行加权处理,用于后期计算与目标新客的匹配度,从而实现实时用户推荐。此处客群画像并不去判断个人的具体画像或标签。After that, after determining the dense source of visiting users, the customer group portrait of the visiting users in each dense source will be calculated according to the characteristic data of the visiting users. The dimensions of the customer group portrait include: the number of sample visits in a fixed period of time, the time distribution of visiting users (for example: counting the number of visiting samples in different periods), component analysis, social graph analysis, etc. The composition analysis can include gender , age, children, spending power, shopping preferences, and other structural features; social graph analysis mainly uses the relevant data information of social software, and the relationship with the relatives, friends and colleagues of the visiting users is extracted by using the LINE algorithm in GraphEmbedding. A representation that reflects the first-order and second-order similarity is used as the basis for matching the customer group portraits of visiting users with the target new customer group portraits in the later stage. At the same time, in the processing of structured features, different processing is adopted for the temporal attributes of user tags: for static features, their representations are stored in the content library to facilitate later use; for dynamic features, such as user preferences, interests, etc. change with time For features that change frequently, long-term representation vectors can be calculated in units of days and weeks, and short-term representation vectors can be calculated in units of hours and minutes, and the short-term representation vectors and long-term representation vectors can be weighted for later calculation and matching of target new customers. to achieve real-time user recommendation. The customer group portrait here does not judge the specific portrait or label of the individual.
根据本发明的又一个实施例,将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户包括:According to another embodiment of the present invention, the similarity between the customer group portrait and the historical user portrait in the business district is compared to determine that the target users include:
分别对所述客群画像与商圈的历史用户画像中的结构化画像数据进行类别编码,并通过降维得到画像表征向量;Category coding is performed on the structured portrait data in the customer group portrait and the historical user portrait in the business district respectively, and the portrait representation vector is obtained through dimensionality reduction;
分别对所述客群画像与商圈的历史用户画像中的非结构化画像数据进行社交网络表征提取,得到网络表征向量;Extracting social network representations from the unstructured portrait data in the customer group portraits and the historical user portraits of the business district, respectively, to obtain a network representation vector;
将所述画像表征向量和所述网络表征向量拼接到一起,分别得到所述客群画像与所述商圈的历史用户画像的特征向量;splicing the portrait representation vector and the network representation vector together to obtain the feature vector of the customer group portrait and the historical user portrait of the business district respectively;
基于特征向量之间的相似度,进行所述客群画像与商圈的历史用户画像的相似度比较,并将相似度满足预设阈值的客群画像所对应的到访用户确定为目标用户。Based on the similarity between the feature vectors, the similarity between the customer group portrait and the historical user portrait in the business district is compared, and the visiting user corresponding to the customer group portrait whose similarity meets a preset threshold is determined as the target user.
其中,商圈的历史用户画像是通过对到访过商圈的历史用户进行特征提取刻画的,在将所述客群画像与商圈的历史用户画像进行相似度比较时,具体通过相似用户挖掘算法来实现。其中相似用户挖掘算法如下:首先对于结构化画像数据(如:年龄、性别、消费能力等),进行类别编码(如:男性标记为0001,女性标记为1000),并通过降维得到更精简的画像表征向量。然后,对于非结构化关系(如社交图)采用Graph Embedding的LINE算法,提取社交网络表征,得到网络表征向量,并同画像表征向量拼接到一起。最后根据拼接后得到的样本的表征向量计算样本与样本之间的相似度,超过某一阈值的样本即为目标新客人群。此处的样本即为一个用户。Among them, the historical user portraits of the business district are characterized by feature extraction of historical users who have visited the business district. algorithm to achieve. The similar user mining algorithm is as follows: First, for structured portrait data (such as age, gender, spending power, etc.), perform category coding (such as: male is marked as 0001, female is marked as 1000), and through dimensionality reduction to get a more streamlined Portrait representation vector. Then, for unstructured relationships (such as social graphs), the LINE algorithm of Graph Embedding is used to extract social network representations to obtain network representation vectors, which are spliced together with portrait representation vectors. Finally, the similarity between samples is calculated according to the characterization vector of the samples obtained after splicing, and the samples that exceed a certain threshold are the target new customer group. The sample here is a user.
根据本发明的再一个实施例,对所述目标用户进行信息推送包括:获取所述目标用户的第一密集来源地,并将所述第一密集来源地返回给商圈,以使所述商圈根据所述第一密集来源地进行信息推送。在确定目标用户之后,由于目标用户的移动轨迹并不局限于某一个商圈,故而还可以根据前述步骤S102中的确定用户的密集来源地的方法来计算目标用户的密集来源地,以更好地确定商圈进行信息推送的范围和推送内容等。According to yet another embodiment of the present invention, pushing the information to the target user includes: acquiring a first dense source of the target user, and returning the first dense source to the business circle, so that the business The circle pushes information according to the first dense source. After the target user is determined, since the movement track of the target user is not limited to a certain business circle, the dense source of the target user can also be calculated according to the method of determining the dense source of the user in the aforementioned step S102, so as to better Determine the scope and content of information push in the business district.
另外,为了保证数据的安全性和隐私性,在将客群画像与商圈的历史用户画像进行相似度比较以确定目标用户时,也可以是通过第三方仲裁机构进行的。以及,在确定目标用户的第一密集来源地之后,也可以返回给第三方仲裁机构进行汇总,然后由第三方仲裁机构发给商圈运营方,进行后续营销规划。这里的合作方可以是一家也可以是多家,取决于营销的内容需求。在客群画像构建时,用户特征来自于商圈自身标签特征、电商标签特征、社交平台的社交图特征等,各方的特征向量数据无法聚合到一方计算余弦相似度。因此采用同态加密技术,将三方平台不同的特征向量计算后的结果经过公钥加密,发送给第三方仲裁机构,第三方仲裁机构用私钥解密后,再将余弦相似度结果发送给商圈方。In addition, in order to ensure the security and privacy of the data, when comparing the similarity between the customer group portrait and the historical user portrait in the business district to determine the target user, it can also be done through a third-party arbitration agency. And, after determining the first intensive source of target users, it can also be returned to a third-party arbitration institution for aggregation, and then sent to the business circle operator by the third-party arbitration institution for follow-up marketing planning. The partners here can be one or more, depending on the content needs of marketing. When constructing the customer group portrait, the user features come from the business circle’s own label features, e-commerce label features, social graph features of social platforms, etc. The feature vector data of all parties cannot be aggregated to one party to calculate the cosine similarity. Therefore, the homomorphic encryption technology is used to encrypt the results of different feature vectors of the three-party platforms through public key encryption and send them to a third-party arbitration institution. square.
图2是本发明实施例的实现流程示意图。如图2所示,其中示出了在本发明的一个实施例中,如何基于商圈方、合作方和第三方仲裁机构进行信息推送。其主要包括以下步骤:FIG. 2 is a schematic diagram of an implementation flow of an embodiment of the present invention. As shown in FIG. 2 , in an embodiment of the present invention, it shows how to push information based on the business circle party, the cooperative party and the third-party arbitration institution. It mainly includes the following steps:
1、双方(商圈方和合作方)约定对用户标识ID的处理规则:在项目开始时,商圈方和合作方规定好统一的用户数据ID格式,以便做ID对齐,这里的ID是唯一的身份标识即可。根据不同的营销场景,对齐的ID类型可以不同。此外也要规定好统一的ID加密方式,ID不可以用明文传输,必须是密文,并统一好加密方式。加密方式可以有多种形式,MD5或哈希加密,或者多种加密方式组合,避免加密后的ID可以被反解密;1. Both parties (business party and partner) agree on the processing rules for user identification IDs: At the beginning of the project, the business community and partners stipulate a unified user data ID format for ID alignment. The ID here is unique of identity. Aligned ID types can be different according to different marketing scenarios. In addition, a unified ID encryption method should also be specified. The ID cannot be transmitted in plaintext, but must be ciphertext, and the encryption method should be unified. The encryption method can have various forms, MD5 or hash encryption, or a combination of various encryption methods to avoid the encrypted ID from being decrypted;
2、双方(商圈方和合作方)准备ID集及特征集:商圈方根据自有业务(如:餐饮、住宿、旅游、出行等)的消费数据,提供过去近一年到访的用户ID。同样地,合作方根据自身业务数据提供近一年的访问用户ID;2. Both parties (business district party and partner) prepare ID sets and feature sets: The business district party provides users who have visited in the past year based on consumption data of its own businesses (such as catering, accommodation, tourism, travel, etc.). ID. Similarly, the partner provides the access user ID for the past year based on its own business data;
3、ID对齐准备:商圈方的运营商此时可以发起ID对齐任务,并发给合作方,以便合作方选择接受或拒绝该ID对齐任务;3. ID alignment preparation: The operator of the business district can initiate an ID alignment task at this time and send it to the partner, so that the partner can choose to accept or reject the ID alignment task;
4、ID对齐处理:双方(商圈方和合作方)将需要进行对齐的ID加密后分别发送给第三方仲裁机构(独立第三方中立安全节点),由仲裁机构进行ID对齐(例如:取交集)的操作,对齐后由仲裁机构传给合作方。在具体实施时,除了将ID汇总到仲裁机构之外,在商圈方和合作方相互信任的情况下,这里也可以由合作方将ID直接发给商圈方,在商圈方完成ID对齐后,由商圈方再发给合作方,或者反之,主要取决于哪种方式满足业务以及安全合规要求。ID对齐的方式并不固定,基于安全考量同样可以有多种方法,但核心是取交集。如果实际业务需要,也可以进行包括取并集,或者左外连接、右外连接等方式。这里的合作方可以是手机APP、线上购物网站、线下购物平台、线上或线下旅游产品售卖平台等,主要取决于商圈方的业务需求以及合作方可以提供的实际信息;4. ID alignment processing: both parties (business circle party and partner) encrypt the IDs that need to be aligned and send them to a third-party arbitration institution (independent third-party neutral security node), and the arbitration institution performs ID alignment (for example: taking the intersection set ) operation, after the alignment is passed to the partner by the arbitration institution. In the specific implementation, in addition to summarizing the ID to the arbitration institution, in the case of mutual trust between the business district party and the partner, the partner can also send the ID directly to the business district party, and complete the ID alignment in the business district party. Afterwards, the business district party will send it to the partner, or vice versa, depending on which method meets the business and security compliance requirements. The way of ID alignment is not fixed. There are also many methods based on security considerations, but the core is to take the intersection. If the actual business needs, it can also include the union, or the left outer join, the right outer join and so on. The partners here can be mobile APPs, online shopping websites, offline shopping platforms, online or offline travel product sales platforms, etc., which mainly depend on the business needs of the business district parties and the actual information that the partners can provide;
5、获取对齐ID的轨迹信息:得到加密的对齐ID(即:双方共有的数据样本)后,合作方根据自有业务数据(如:手机信令等)查询样本过去一年的移动轨迹信息。其中,信令数据是移动运营商以固定频率采集移动端当前时间和当前位置的数据;5. Obtain the trajectory information of the alignment ID: After obtaining the encrypted alignment ID (that is, the data sample shared by both parties), the partner queries the movement trajectory information of the sample in the past year according to its own business data (such as mobile phone signaling, etc.). Among them, the signaling data is the data collected by the mobile operator at a fixed frequency of the current time and current location of the mobile terminal;
6、计算到访用户的密集来源地:首先,获得对齐样本轨迹信息之后,合作方根据轨迹信息判断样本居住地和工作地,其中判断逻辑样本在工作日的9点-17点的高频驻留点为工作地,样本在晚10-早8点的高频驻留点为居住地(驻留点:样本轨迹信息长时间停留的位置)。然后,完成所有样本居住地和工作地计算,采用空间聚集算法进行居住地和工作地聚类,得到聚类区域。最后从聚类结果中筛选出到访用户在空间中分布最密集的10个聚类区域,并展示在地图中,即商圈到访用户前十密集来源地;6. Calculate the dense source of visiting users: First, after obtaining the alignment sample trajectory information, the partner judges the sample residence and work place according to the trajectory information, and the judgment logic sample is in the high-frequency station from 9:00 to 17:00 on weekdays. The staying point is the work place, and the high-frequency residence point of the sample from 10 pm to 8 am is the residence (residency point: the location where the sample trajectory information stays for a long time). Then, complete the calculation of all the sample residence and work places, and use the spatial aggregation algorithm to cluster the residence and work places to obtain the clustering area. Finally, from the clustering results, the 10 cluster areas with the most dense distribution of visiting users in the space are screened out and displayed on the map, that is, the top ten densely sourced places of visiting users in the business district;
7、刻画到访用户客群画像:得到商圈到访用户前十密集来源地后,计算所有来源地样本群体画像。群体画像维度包括:固定时间段的样本到访量、到访用户出现的时间分布情况(如:统计不同时段到访样本数量)、组成成分分析、社交图分析等,组成分析可以包括性别、年龄、有无子女、消费能力、购物偏好、等结构化特征;社交图分析主要利用社交软件的相关数据信息,将与已到访用户的亲友、同事关系通过利用Graph Embedding中的LINE算法,提取出反映一阶、二阶相似度的表征,作为后期将到访用户的客群画像与目标新客客群画像匹配的依据。同时在结构化特征的处理中,针对用户标签的时间属性采取不同的处理:对于静态特征,将其表征存储在内容库中,以方便后期使用,对于动态特征,如用户偏好、兴趣等随时间经常发生变化的特征,可以日、周为单位计算长期表征向量,以时、分钟为单位计算短期表征向量,将短期表征向量与长期表征向量进行加权处理,用于后期计算与目标新客的匹配度,从而实现实时用户推荐。此处群体画像并不去判断个人的具体画像或标签;7. Depicting the portraits of visiting user groups: After obtaining the top ten densely sourced places of visiting users in the business district, calculate the portraits of the sample groups of all origins. The dimensions of group portraits include: the number of sample visits in a fixed period of time, the time distribution of visiting users (such as counting the number of visiting samples in different periods), component analysis, social graph analysis, etc. The composition analysis can include gender, age , the presence or absence of children, spending power, shopping preferences, and other structural features; social graph analysis mainly uses the relevant data information of social software, and the relationship with the relatives, friends and colleagues of the visited users is extracted by using the LINE algorithm in Graph Embedding. The representations reflecting the first-order and second-order similarity are used as the basis for matching the customer group portraits of visiting users with the target new customer group portraits in the later stage. At the same time, in the processing of structured features, different processing is adopted for the temporal attributes of user tags: for static features, their representations are stored in the content library to facilitate later use; for dynamic features, such as user preferences, interests, etc. change with time For features that change frequently, long-term representation vectors can be calculated in units of days and weeks, and short-term representation vectors can be calculated in units of hours and minutes, and the short-term representation vectors and long-term representation vectors can be weighted for later calculation and matching of target new customers. to achieve real-time user recommendation. The group portrait here does not judge the specific portrait or label of the individual;
8、获取目标新客人群:合作方将得到的商圈到访用户的用户画像与商圈的历史用户画像通过相似用户挖掘算法得到未到访用户中与到访用户相似度较高的用户ID,即目标新客ID。其中相似用户挖掘算法如下:首先对于结构化画像数据(如:年龄、性别、消费能力等),进行类别编码(如:男性标记为0001,女性标记为1000),并通过降维得到更精简的画像表征向量。然后,对于非结构化关系(如社交图)依采用Graph Embedding的LINE算法,提取社交网络表征,得到网络表征向量,并同画像表征向量拼接到一起。最后根据样本的表征向量计算样本与样本之间的相似度,超过某一阈值样本即为目标新客人群。8. Acquire the target new customer group: The partner will obtain the user portraits of the visiting users in the business circle and the historical user portraits of the business circle through the similar user mining algorithm to obtain the user IDs of the non-visiting users with a high similarity to the visiting users. , which is the target new customer ID. The similar user mining algorithm is as follows: First, for structured portrait data (such as age, gender, spending power, etc.), perform category coding (such as: male is marked as 0001, female is marked as 1000), and through dimensionality reduction to get a more streamlined Portrait representation vector. Then, for the unstructured relationship (such as social graph), the LINE algorithm of Graph Embedding is used to extract the social network representation, obtain the network representation vector, and splicing it together with the portrait representation vector. Finally, the similarity between the samples is calculated according to the characterization vector of the samples, and the samples that exceed a certain threshold are the target new customer group.
9、获取目标新客人群密集来源地:获得目标新客人群,使用如步骤6的方法,计算目标新客人群的前十密集来源地;9. Obtain the dense source of the target new guest group: To obtain the target new guest group, use the method as in step 6 to calculate the top ten dense sources of the target new guest group;
10、返回结果:返回给第三方仲裁机构进行汇总,然后由仲裁机构发给商圈运营方,进行后续营销规划。这里的合作方可以是一家也可以是多家,取决于营销的内容;10. Return the result: return it to a third-party arbitration institution for summary, and then the arbitration institution will send it to the business circle operator for follow-up marketing planning. The partners here can be one or more, depending on the content of marketing;
11、由商圈方进行实际的营销活动。11. The actual marketing activities are carried out by the business district side.
图3是本发明实施例的客群画像构建与新客挖掘过程示意图。如图3所示,商圈方和合作方跟别进行用户特征提取、降维后,得到群体画像,如图中所示的用户i和用户j,均表示群体画像。之后,通过将群体画像进行相似比较得到中间结果,然后将中间结果进行同态加密后发送给第三方仲裁机构。在整个过程中,第三方仲裁机构或商圈方或合作方,均未能直接获取其他方的原始数据,均是获取的加密处理后的数据或隐私处理后的数据,从而保证了数据的安全性和隐私性。FIG. 3 is a schematic diagram of a process of constructing a portrait of a customer group and mining new customers according to an embodiment of the present invention. As shown in Figure 3, after the business district party and the partner perform user feature extraction and dimensionality reduction, group portraits are obtained. User i and user j shown in the figure both represent group portraits. After that, an intermediate result is obtained by comparing the group portraits by similarity, and then the intermediate result is homomorphically encrypted and sent to a third-party arbitration institution. During the whole process, neither the third-party arbitration institution nor the business circle party or the partner can directly obtain the original data of the other party, but the obtained data after encryption or privacy processing, thus ensuring the security of the data. sex and privacy.
图4是根据本发明实施例的信息推送的装置的主要模块示意图。如图4所示,本发明实施例的信息推送的装置400主要包括用户标识对齐模块401、用户来源地确定模块402、客群画像计算模块403和目标用户确定模块404。FIG. 4 is a schematic diagram of main modules of an apparatus for pushing information according to an embodiment of the present invention. As shown in FIG. 4 , the
用户标识对齐模块401,用于获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;The user
用户来源地确定模块402,用于根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;A user
客群画像计算模块403,用于根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;The customer group
目标用户确定模块404,用于将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送。The target
根据本发明的一个实施例,所述标识对齐处理包括:取交集处理、取并集处理、左外连接处理或右外连接处理。According to an embodiment of the present invention, the identification alignment processing includes: intersection processing, union processing, left outer join processing or right outer join processing.
根据本发明的另一个实施例,用户标识对齐模块401还可以用于:将所述待对齐用户标识集发送给第三方仲裁机构,以使所述第三方仲裁机构根据所述待对齐用户标识集与商圈用户标识集进行标识对齐处理。According to another embodiment of the present invention, the user
根据本发明的又一个实施例,信息推送的装置400还包括标识集预处理模块(图中未示出),用于:According to another embodiment of the present invention, the
在将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理之前,获取预先与商圈约定的用户标识的处理规则,所述处理规则包括用户标识的数据格式和加密方式;Before performing the identification alignment process on the user identification set to be aligned and the business circle user identification set, obtain the processing rule of the user identification agreed with the business circle in advance, and the processing rule includes the data format and encryption method of the user identification;
使用所述处理规则对所述待对齐用户标识集进行处理,并将处理后的待对齐用户标识集作为所述待对齐用户标识集。The user identification set to be aligned is processed by using the processing rule, and the processed user identification set to be aligned is used as the user identification set to be aligned.
根据本发明的又一个实施例,用户来源地确定模块402还可以用于:According to yet another embodiment of the present invention, the user
基于所述移动轨迹数据确定每个到访用户的居住地和工作地;Determine the residence and work place of each visiting user based on the movement trajectory data;
采用空间聚集算法进行居住地和工作地聚类,得到多个聚类区域;Use the spatial clustering algorithm to cluster residence and work, and obtain multiple clustered areas;
从所述多个聚类区域中选出到访用户在空间中分布最密集的指定个数的聚类区域作为到访用户的密集来源地。From the plurality of clustering regions, a specified number of clustering regions where the visiting users are most densely distributed in space are selected as the dense source places of the visiting users.
根据本发明的又一个实施例,目标用户确定模块604还可以用于:According to yet another embodiment of the present invention, the target
分别对所述客群画像与商圈的历史用户画像中的结构化画像数据进行类别编码,并通过降维得到画像表征向量;Category coding is performed on the structured portrait data in the customer group portrait and the historical user portrait in the business district respectively, and the portrait representation vector is obtained through dimensionality reduction;
分别对所述客群画像与商圈的历史用户画像中的非结构化画像数据进行社交网络表征提取,得到网络表征向量;Extracting social network representations from the unstructured portrait data in the customer group portraits and the historical user portraits of the business district, respectively, to obtain a network representation vector;
将所述画像表征向量和所述网络表征向量拼接到一起,分别得到所述客群画像与所述商圈的历史用户画像的特征向量;splicing the portrait representation vector and the network representation vector together to obtain the feature vector of the customer group portrait and the historical user portrait of the business district respectively;
基于特征向量之间的相似度,进行所述客群画像与商圈的历史用户画像的相似度比较,并将相似度满足预设阈值的客群画像所对应的到访用户确定为目标用户。Based on the similarity between the feature vectors, the similarity between the customer group portrait and the historical user portrait in the business district is compared, and the visiting user corresponding to the customer group portrait whose similarity meets a preset threshold is determined as the target user.
根据本发明的再一个实施例,目标用户确定模块604还可以用于:According to yet another embodiment of the present invention, the target
获取所述目标用户的第一密集来源地,并将所述第一密集来源地返回给商圈,以使所述商圈根据所述第一密集来源地进行信息推送。Acquire the first dense source of the target user, and return the first dense source to the business circle, so that the business circle pushes information according to the first dense source.
根据本发明实施例的技术方案,通过获取待对齐用户标识集,并将待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;根据业务数据获取对齐用户标识的移动轨迹数据,并基于移动轨迹数据确定到访用户的密集来源地;根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;将客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对目标用户进行信息推送,实现了可以集合多方数据来进行用户画像刻画和信息推送,弥补了商圈运营场景下用户数据特征维度单一的缺陷,实现了对商圈未到访用户来源地的充分挖掘,从而更好地在商圈新客引流场景下进行广告等信息推送;基于用户轨迹信息的时空属性,得到用户的精确密集来源地,从而可以更好、更实时、更准确地进行用户客群画像的刻画,更准确地确定商圈的潜在客户,便于提高信息推送成功率。According to the technical solution of the embodiment of the present invention, the user identification set to be aligned is obtained by obtaining the user identification set to be aligned, and the user identification set to be aligned and the business circle user identification set are subjected to identification alignment processing to obtain the aligned user identification; according to the business data, the movement trajectory data of the aligned user identification is obtained. , and based on the movement trajectory data to determine the dense source of visiting users; according to the characteristic data of the visiting users, calculate the customer group portraits of the visiting users in each dense source place; Compare the similarity to determine the target user, and push the target user information, which realizes the collection of multi-party data for user portrait characterization and information push, which makes up for the single dimension of user data in the business circle operation scenario. The source of unvisited users in the business district is fully explored, so as to better push advertisements and other information in the scenario of new customer drainage in the business district; , More real-time and more accurate characterization of user customer group portraits, more accurate identification of potential customers in the business district, and easy to improve the success rate of information push.
本技术方案首先利用轨迹的时间属性,筛选出精确的驻留地作为用户的来源地;后利用空间属性,采用OPTICS密度聚类算法和相似用户挖掘算法通过现有客流分析实现目标新客来源地的探索与挖掘,锁定潜在目标客源信息,解决“面向谁,在哪宣传,怎么宣传,什么时候宣传”的问题,大大提升引流策划营销的效率。客群画像构建时,用户特征来自于商圈自身标签特征、电商标签特征、社交平台的社交图特征。各方的特征向量数据无法聚合到一方计算余弦相似度。因此采用同态加密技术,将三方平台不同的特征向量计算后的结果经过公钥加密,发送给第三方仲裁机构,第三方仲裁机构用私钥解密后,再将余弦相似度结果发送给商圈。This technical solution first uses the time attribute of the trajectory to screen out the precise residence as the source of the user; then uses the spatial attribute, adopts the OPTICS density clustering algorithm and the similar user mining algorithm to realize the target new customer source through the existing passenger flow analysis The exploration and mining of potential target customers, to solve the problem of "who to promote, where to promote, how to promote, and when to promote", greatly improve the efficiency of drainage planning and marketing. When building the customer group portrait, the user features come from the business circle’s own label features, e-commerce label features, and social graph features of social platforms. The feature vector data of each party cannot be aggregated to one party to calculate the cosine similarity. Therefore, the homomorphic encryption technology is used to encrypt the results of different feature vectors of the three-party platforms through public key encryption and send them to a third-party arbitration institution. .
图5示出了可以应用本发明实施例的信息推送的方法或信息推送的装置的示例性系统架构500。FIG. 5 shows an
如图5所示,系统架构500可以包括终端设备501、502、503,网络504和服务器505。网络504用以在终端设备501、502、503和服务器505之间提供通信链路的介质。网络504可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 5 , the
用户可以使用终端设备501、502、503通过网络504与服务器505交互,以接收或发送消息等。终端设备501、502、503上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、广告推送工具、社交平台软件等(仅为示例)。The user can use the
终端设备501、502、503可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The
服务器505可以是提供各种服务的服务器,例如对用户利用终端设备501、502、503所浏览的购物类网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的信息推送请求等数据进行获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送等处理,并将处理结果(例如信息推送结果、产品信息推送结果--仅为示例)反馈给终端设备。The
需要说明的是,本发明实施例所提供的信息推送的方法一般由服务器505执行,相应地,信息推送的装置一般设置于服务器505中。It should be noted that the method for pushing information provided in the embodiment of the present invention is generally performed by the
应该理解,图5中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 5 are only illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
下面参考图6,其示出了适于用来实现本发明实施例的终端设备或服务器的计算机系统600的结构示意图。图6示出的终端设备或服务器仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Referring to FIG. 6 below, it shows a schematic structural diagram of a
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, a
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I/O interface 605: an
特别地,根据本发明公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本发明的系统中限定的上述功能。In particular, the processes described above with reference to the flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the present invention. For example, embodiments disclosed herein include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the
需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented in special purpose hardware-based systems that perform the specified functions or operations, or can be implemented using A combination of dedicated hardware and computer instructions is implemented.
描述于本发明实施例中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,例如,可以描述为:一种处理器包括用户标识对齐模块、用户来源地确定模块、客群画像计算模块和目标用户确定模块。其中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定,例如,用户标识对齐模块还可以被描述为“用于获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识的模块”。The units or modules involved in the embodiments of the present invention may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit or module can also be set in the processor, for example, it can be described as: a processor includes a user identification alignment module, a user source determination module, a customer group portrait calculation module and a target user determination module. The names of these units or modules do not constitute a limitation on the units or modules themselves, for example, the user ID alignment module can also be described as "used to obtain the set of user IDs to be aligned, and The user identification set to be aligned and the business circle user identification set are subjected to identification alignment processing to obtain a module for aligning user identifications.”
作为另一方面,本发明还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,使得该设备包括:获取待对齐用户标识集,并将所述待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;根据业务数据获取所述对齐用户标识的移动轨迹数据,并基于所述移动轨迹数据确定到访用户的密集来源地;根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;将所述客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对所述目标用户进行信息推送。As another aspect, the present invention also provides a computer-readable medium, which may be included in the device described in the above embodiments; or may exist alone without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device includes: acquiring a user identification set to be aligned, and combining the user identification set to be aligned with the quotient. Circle the user identification set to carry out identification alignment processing to obtain the aligned user identification; obtain the movement trajectory data of the aligned user identification according to the business data, and determine the dense source of visiting users based on the movement trajectory data; according to the characteristic data of the visiting users , respectively calculate the customer group portraits of the visiting users in each dense source area; compare the similarity between the customer group portraits and the historical user portraits of the business district to determine the target users, and push the information to the target users.
根据本发明实施例的技术方案,通过获取待对齐用户标识集,并将待对齐用户标识集与商圈用户标识集进行标识对齐处理得到对齐用户标识;根据业务数据获取对齐用户标识的移动轨迹数据,并基于移动轨迹数据确定到访用户的密集来源地;根据到访用户的特征数据,分别计算每个密集来源地的到访用户的客群画像;将客群画像与商圈的历史用户画像进行相似度比较以确定目标用户,并对目标用户进行信息推送,实现了可以集合多方数据来进行用户画像刻画和信息推送,弥补了商圈运营场景下用户数据特征维度单一的缺陷,实现了对商圈未到访用户来源地的充分挖掘,从而更好地在商圈新客引流场景下进行广告等信息推送;基于用户轨迹信息的时空属性,得到用户的精确密集来源地,从而可以更好、更实时、更准确地进行用户客群画像的刻画,更准确地确定商圈的潜在客户,便于提高信息推送成功率。According to the technical solution of the embodiment of the present invention, the user identification set to be aligned is obtained by obtaining the user identification set to be aligned, and the user identification set to be aligned and the business circle user identification set are subjected to identification alignment processing to obtain the aligned user identification; according to the business data, the movement trajectory data of the aligned user identification is obtained. , and determine the intensive source of visiting users based on the movement trajectory data; according to the characteristic data of the visiting users, calculate the customer group portraits of the visiting users in each dense source place; Compare the similarity to determine the target user, and push the target user information, which realizes the collection of multi-party data for user portrait characterization and information push, which makes up for the single dimension of user data in the business circle operation scenario. The source of unvisited users in the business district is fully explored, so as to better push advertisements and other information in the scenario of new customer drainage in the business district; , More real-time and more accurate portrayal of user customer group portraits, more accurate identification of potential customers in the business district, and easy to improve the success rate of information push.
上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210378006.7A CN114741595A (en) | 2022-04-12 | 2022-04-12 | Method and device for pushing information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210378006.7A CN114741595A (en) | 2022-04-12 | 2022-04-12 | Method and device for pushing information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114741595A true CN114741595A (en) | 2022-07-12 |
Family
ID=82281956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210378006.7A Pending CN114741595A (en) | 2022-04-12 | 2022-04-12 | Method and device for pushing information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114741595A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876015A (en) * | 2024-03-11 | 2024-04-12 | 南京数策信息科技有限公司 | User behavior data analysis method and device and related equipment |
CN117896626A (en) * | 2024-03-15 | 2024-04-16 | 深圳市瀚晖威视科技有限公司 | Method, device, equipment and storage medium for detecting motion trajectory with multiple cameras |
CN118312672A (en) * | 2024-04-18 | 2024-07-09 | 兰州大学 | Big data intelligent cloud guest acquisition system based on dimension compaction |
CN119052736A (en) * | 2024-10-25 | 2024-11-29 | 长沙数智科技集团有限公司 | Targeting short message touch method based on dynamic position tracking and crowd density analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651437A (en) * | 2016-11-15 | 2017-05-10 | 武汉璞华大数据技术有限公司 | Method for marketing promotion based on big data |
CN112819544A (en) * | 2021-02-25 | 2021-05-18 | 平安普惠企业管理有限公司 | Advertisement putting method, device, equipment and storage medium based on big data |
CN113763085A (en) * | 2020-09-23 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Information pushing method and system, electronic equipment and storage medium |
-
2022
- 2022-04-12 CN CN202210378006.7A patent/CN114741595A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651437A (en) * | 2016-11-15 | 2017-05-10 | 武汉璞华大数据技术有限公司 | Method for marketing promotion based on big data |
CN113763085A (en) * | 2020-09-23 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Information pushing method and system, electronic equipment and storage medium |
CN112819544A (en) * | 2021-02-25 | 2021-05-18 | 平安普惠企业管理有限公司 | Advertisement putting method, device, equipment and storage medium based on big data |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117876015A (en) * | 2024-03-11 | 2024-04-12 | 南京数策信息科技有限公司 | User behavior data analysis method and device and related equipment |
CN117876015B (en) * | 2024-03-11 | 2024-05-07 | 南京数策信息科技有限公司 | User behavior data analysis method and device and related equipment |
CN117896626A (en) * | 2024-03-15 | 2024-04-16 | 深圳市瀚晖威视科技有限公司 | Method, device, equipment and storage medium for detecting motion trajectory with multiple cameras |
CN117896626B (en) * | 2024-03-15 | 2024-05-14 | 深圳市瀚晖威视科技有限公司 | Method, device, equipment and storage medium for detecting motion trajectory with multiple cameras |
CN118312672A (en) * | 2024-04-18 | 2024-07-09 | 兰州大学 | Big data intelligent cloud guest acquisition system based on dimension compaction |
CN119052736A (en) * | 2024-10-25 | 2024-11-29 | 长沙数智科技集团有限公司 | Targeting short message touch method based on dynamic position tracking and crowd density analysis |
CN119052736B (en) * | 2024-10-25 | 2025-02-11 | 长沙数智科技集团有限公司 | Targeted SMS reach method based on dynamic location tracking and crowd density analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Libert | An automated approach to auditing disclosure of third-party data collection in website privacy policies | |
Steiger et al. | An advanced systematic literature review on spatiotemporal analyses of t witter data | |
CN114741595A (en) | Method and device for pushing information | |
Sobolev et al. | News and geolocated social media accurately measure protest size variation | |
Gundecha et al. | Mining social media: a brief introduction | |
Pantelis et al. | Understanding the value of (big) data | |
Xu et al. | Integrated collaborative filtering recommendation in social cyber-physical systems | |
CN111046237B (en) | User behavior data processing method and device, electronic equipment and readable medium | |
CN112508075B (en) | DBSCAN clustering method based on transverse federation and related equipment thereof | |
CN113076305B (en) | Data processing method, device, electronic device and storage medium | |
Arampatzis et al. | Suggesting points-of-interest via content-based, collaborative, and hybrid fusion methods in mobile devices | |
US9710470B2 (en) | Social recommendation across heterogeneous networks | |
CN107808346B (en) | A kind of evaluation method and evaluation device of potential target object | |
Singh et al. | A survey on the generation of recommender systems | |
US20150348216A1 (en) | Influencer analyzer platform for social and traditional media document authors | |
Yin et al. | Autrust: A practical trust measurement for adjacent users in social networks | |
Lee et al. | Evaluations of similarity measures on VK for link prediction | |
Kou et al. | Trust‐Based Missing Link Prediction in Signed Social Networks with Privacy Preservation | |
CN111563107A (en) | Method, apparatus, electronic device and storage medium for information recommendation | |
CN117909764A (en) | Information matching method, device, equipment, medium and program product | |
US20240311506A1 (en) | Computerized systems and methods for safeguarding privacy | |
CN116244751A (en) | Data desensitization method, device, electronic equipment, storage medium and program product | |
Zhu | Research on multi‐source mobile commerce service recommendation model of data fusion based on tree network | |
Wang et al. | A novel task recommendation model for mobile crowdsourcing systems | |
Li et al. | An efficient critical incident propagation model for social networks based on trust factor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |