WO2021068681A1 - 标签分析方法、装置及计算机可读存储介质 - Google Patents

标签分析方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2021068681A1
WO2021068681A1 PCT/CN2020/112333 CN2020112333W WO2021068681A1 WO 2021068681 A1 WO2021068681 A1 WO 2021068681A1 CN 2020112333 W CN2020112333 W CN 2020112333W WO 2021068681 A1 WO2021068681 A1 WO 2021068681A1
Authority
WO
WIPO (PCT)
Prior art keywords
tag
user
label
similarity
interaction data
Prior art date
Application number
PCT/CN2020/112333
Other languages
English (en)
French (fr)
Inventor
付昌林
罗滢川
陈少梅
肖良清
石文富
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021068681A1 publication Critical patent/WO2021068681A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a tag analysis method, device and computer-readable storage medium based on user behavior.
  • label analysis based on user behavior has been applied to all aspects of society.
  • label analysis based on user behavior has been applied .
  • the analysis of user tags is relatively accurate, due to the high computational intensity and numerous data collections, a lot of computing resources are wasted, and it can The scalability also needs to be improved, so there is an urgent need to provide a label analysis method based on user behavior that is easy to calculate and has high scalability.
  • This application provides a label analysis method based on user behavior, including:
  • the present application also provides an electronic device that includes a memory and a processor, the memory stores a user behavior-based label analysis program that can be run on the processor, and the user behavior-based label When the analysis program is executed by the processor, the following steps are implemented:
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a label analysis program based on user behavior, and the label analysis program based on user behavior can be executed by one or more processors , In order to implement the steps of the label analysis method based on user behavior as described below:
  • the present application also provides a tag analysis device based on user behavior, the device includes:
  • the data receiving and processing module is used to receive a pre-built tag set, collect the user's original interaction data set, and preprocess the original interaction data set to obtain a standard interaction data set.
  • the tag relationship establishment module is configured to establish the tag relationship of the user according to the standard interaction data set and the tag set.
  • the similarity calculation and tag ranking module is used to calculate the similarity based on the tag relationship and the pre-built user tag model to obtain a similarity set, calculate the tag score in the tag relationship according to the similarity set, and calculate the tag score according to the similarity set. According to the tag score, the tag sorting is performed to obtain the tag sorting set.
  • the label analysis result output module is configured to select labels from the label sorting set according to the preset number of labels to obtain the label analysis result of the user, and output the label analysis result.
  • FIG. 1 is a schematic flowchart of a label analysis method based on user behavior provided by an embodiment of this application;
  • Figure 2 is a schematic diagram of a user-tag in an embodiment
  • FIG. 3 is a schematic diagram of the internal structure of an electronic device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of modules of a tag analysis device based on user behavior provided by an embodiment of the application.
  • This application provides a label analysis method based on user behavior.
  • FIG. 1 it is a schematic flowchart of a label analysis method based on user behavior provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the label analysis method based on user behavior includes:
  • the tag set is pre-built and includes multiple types of tags, such as age type: post 70s, post 80s, post 90s, etc.; pets: dislike pets, love dogs, cats, etc.; chase type: martial arts , Urban youth, horror, etc.; game types: no games, role-playing, fighting, shooting, etc.; music types: DJ, folk songs, pop, etc.; living area: first-tier developed areas, second-tier areas, underdeveloped areas, etc.
  • tags such as age type: post 70s, post 80s, post 90s, etc.
  • pets dislike pets, love dogs, cats, etc.
  • chase type martial arts , Urban youth, horror, etc.
  • game types no games, role-playing, fighting, shooting, etc.
  • music types DJ, folk songs, pop, etc.
  • living area first-tier developed areas, second-tier areas, underdeveloped areas, etc.
  • the original interaction data set includes regular interaction data and request interaction data.
  • the conventional interactive data includes collecting the user's geographic location, user platform operating system, user platform version, user application process or web page, etc. which are started, etc., for example, when the user performs a series of operations on the mobile phone, the conventional interactive data includes the mobile phone system (Such as IOS, MIUI, Flyme, etc.), the version of the mobile phone system, and the user's application process, such as WeChat, QQ, Taobao, JD, NetEase Cloud Music, and so on.
  • the requested interaction data is a series of data requested by the user through the platform operating system and the application process, including the number of startups of the application (webpage), the number of user logins, and the user's access to the application Search content of programs (web pages), etc., browse content, etc. If the user starts a shopping website and searches for Jin Yong's martial arts novels based on the shopping website, the search for Jin Yong's martial arts novels based on the shopping website is the request interaction data.
  • the regular interaction data does not need to be collected frequently, and a timed collection method may be adopted, for example, it is preset to collect the regular interaction data every 12 hours.
  • the request interaction data can be monitored in real time.
  • the collection method of the interactive data set can be embedded in the user platform operating system based on pre-built code, for example, the Android platform operating system pre-calls the Alarm Manager method that comes with the Android platform operating system for collection, etc. .
  • the preprocessing is to clean up the abnormal original interactive data that occurred during the collection process, including blank data, garbled characters, etc., if the collected garbled characters are found when searching for Jin Yong's martial arts novels based on the shopping website, the preprocessing It can be transcoded to the correct format or eliminated.
  • the establishing the label relationship of the user includes: extracting keywords of the standard interaction data set and performing deduplication processing to obtain a keyword set, and extracting from the label set according to the keyword set Related tags obtain the tag relationship of the user.
  • the standard interaction data set includes conventional interaction data such as user geographic location, user platform operating system, application process, etc., as well as various request interaction data.
  • This application extracts the user's geographic location, user platform operating system, user's common application processes, etc. from the conventional interactive data.
  • the frequently-used application processes of the user are screened according to a preset threshold of usage times. For example, within a week, whether the number of times the application process has been opened exceeds the preset threshold of use times, and if the application process exceeds the preset threshold of use times, set the application process to the user Common application processes.
  • this application extracts keywords frequently searched and browsed by the user from the request interaction data, such as the user often searches for the Chinese Pastoral Dog, watches martial arts type movies and TV series, etc., so keywords such as the Chinese Pastoral Dog and Martial Arts can be extracted.
  • this application forms a keyword set based on the extracted keywords, and de-duplicates the keyword set. For example, the user searches for the Chinese Pastoral Dog in Program A, and also searches for the Chinese Pastoral Dog in Program B. Therefore, there will be cases where the keywords are the same.
  • the relevant tags are extracted from the tag set according to the keyword set to obtain the tag relationship of the user.
  • the keyword set includes Chinese Pastoral Dog, Wuxia, and Shanghai
  • the corresponding pets are extracted from the tag set: dog-loving; drama-chasing type: Wuxia; living area: first-tier developed area, so the user’s profile is established Label relationship.
  • the method for calculating similarity includes: establishing a user-tag bipartite graph based on the user's tag relationship and the user tag model, and calculating user similarity and tag similarity based on the user-tag bipartite graph Degree, the user similarity and the tag similarity are formed according to the corresponding relationship between the user and the tag to obtain a similarity set.
  • the pre-built user tag model is a pre-verified and correct user-tag correspondence, for example, the tag of user A is pets: cats; drama type: urban youth; living area: second-tier area, The user A's label has been verified with the user A and confirmed to be correct.
  • the user-label two-part picture can be referred to as shown in FIG. 2.
  • One label corresponds to one or more users at the same time, and one user corresponds to one or more labels at the same time, where user a and user b are said
  • user c is the user in the tag relationship described in this application.
  • the user similarity is:
  • S m+1 (u, u') represents the similarity between the user and the user in the user tag model
  • u is the interaction data of the user
  • m Is the number of iterations
  • Trust(u,u') is the trust degree of u,u'
  • O(u) represents the set of user tags
  • O(u') represents the set of tags of user u'in the user label model
  • S m+1 (O i (u), O j (u')) represents the similarity between the label i of the user and the label j of the user u'in the user label model
  • C 1 is between [ Constant between 0,1].
  • S m+1 (t, t') represents the tag similarity between the tag t of the user and the tag t'in the user tag model
  • I(t) represents the similar tag set of the tag t
  • I (t') represents the set of similar tags of the tag t'
  • S m+1 (I i (t),I j (t')) represents the set of similar tags i of the tag t and the set of similar tags t'
  • the similarity of similar label set j, C 2 is a constant between [0,1].
  • the method for calculating the tag score is:
  • r S(u,t) represents the tag score
  • S(u,t) represents the similarity set
  • u is the user’s interaction data
  • t is the user’s tag
  • u' is the Interaction data in the user tag model
  • It represents the similarity between the user of the user tag model and the user
  • ru, t represents the filter value of the user and the user's tag.
  • the And said ru,t can be solved by SimRank series algorithm based on collaborative filtering algorithm and Markov chain series algorithm.
  • the final tag score of the user is: Tag A, 75 points; Tag B, 93 points; Tag C, 61 points; Tag D, 32 points; Tag E, 88 points. If the preset number of tags is 3, tag B is extracted, and tag E and tag A are referred to as the user's tag analysis result.
  • the invention also provides a tag analysis device based on user behavior.
  • FIG. 3 it is a schematic diagram of the internal structure of an electronic device provided by an embodiment of this application.
  • the electronic device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, or a portable computer, or a server.
  • the electronic device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart media card (SMC), or a secure digital (SD) Card, Flash Card, etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of the tag analysis program 01 based on user behavior, etc., but also to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, such as execution of tag analysis program 01 based on user behavior, etc.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor or other data processing chip, for running program codes or processing stored in the memory 11 Data, such as execution of tag analysis program 01 based on user behavior, etc.
  • the communication bus 13 is used to realize the connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the apparatus 1 and other electronic devices.
  • the device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • FIG. 3 only shows the electronic device 1 with components 11-14 and the label analysis program 01 based on user behavior.
  • the structure shown in FIG. 1 does not constitute a limitation on the electronic device 1. It may include fewer or more components than shown, or a combination of certain components, or a different component arrangement.
  • the memory 11 stores a label analysis program 01 based on user behavior; the processor 12 implements the following steps when executing the label analysis program 01 based on user behavior stored in the memory 11:
  • Step 1 Receive a pre-built tag set, collect a user's original interaction data set, and preprocess the original interaction data set to obtain a standard interaction data set.
  • the tag set is pre-built and includes multiple types of tags, such as age type: post 70s, post 80s, post 90s, etc.; pets: dislike pets, love dogs, cats, etc.; chase type: martial arts , Urban youth, horror, etc.; game types: no games, role-playing, fighting, shooting, etc.; music types: DJ, folk songs, pop, etc.; living area: first-tier developed areas, second-tier areas, underdeveloped areas, etc.
  • tags such as age type: post 70s, post 80s, post 90s, etc.
  • pets dislike pets, love dogs, cats, etc.
  • chase type martial arts , Urban youth, horror, etc.
  • game types no games, role-playing, fighting, shooting, etc.
  • music types DJ, folk songs, pop, etc.
  • living area first-tier developed areas, second-tier areas, underdeveloped areas, etc.
  • the original interaction data set includes regular interaction data and request interaction data.
  • the conventional interactive data includes collecting the user's geographic location, user platform operating system, user platform version, user application process or web page, etc. which are started, etc., for example, when the user performs a series of operations on the mobile phone, the conventional interactive data includes the mobile phone system (Such as IOS, MIUI, Flyme, etc.), the version of the mobile phone system, and the user's application process, such as WeChat, QQ, Taobao, JD, NetEase Cloud Music, and so on.
  • the requested interaction data is a series of data requested by the user through the platform operating system and the application process, including the number of startups of the application (webpage), the number of user logins, and the user's access to the application Search content of programs (web pages), etc., browse content, etc. If the user starts a shopping website and searches for Jin Yong's martial arts novels based on the shopping website, the search for Jin Yong's martial arts novels based on the shopping website is the request interaction data.
  • the regular interaction data does not need to be collected frequently, and a timed collection method may be adopted, for example, it is preset to collect the regular interaction data every 12 hours.
  • the request interaction data can be monitored in real time.
  • the collection method of the interactive data set can be embedded in the user platform operating system based on pre-built code, for example, the Android platform operating system pre-calls the Alarm Manager method that comes with the Android platform operating system for collection, etc. .
  • the preprocessing is to clean up the abnormal original interactive data that occurred during the collection process, including blank data, garbled characters, etc., if the collected garbled characters are found when searching for Jin Yong's martial arts novels based on the shopping website, the preprocessing It can be transcoded to the correct format or eliminated.
  • Step 2 Establish the tag relationship of the user according to the standard interaction data set and the tag set.
  • the establishing the label relationship of the user includes: extracting keywords of the standard interaction data set and performing deduplication processing to obtain a keyword set, and extracting from the label set according to the keyword set Related tags obtain the tag relationship of the user.
  • the standard interaction data set includes conventional interaction data such as user geographic location, user platform operating system, application process, etc., as well as various request interaction data.
  • This application extracts the user's geographic location, user platform operating system, user's common application processes, etc. from the conventional interactive data.
  • the frequently-used application processes of the user are screened according to a preset threshold of usage times. For example, within a week, whether the number of times the application process has been opened exceeds the preset threshold of use times, and if the application process exceeds the preset threshold of use times, set the application process to the user Common application processes.
  • this application extracts keywords frequently searched and browsed by the user from the request interaction data, such as the user often searches for Chinese Pastoral Dogs, watching martial arts type movies and TV series, etc., so keywords such as Chinese Pastoral Dogs and Martial Arts can be extracted.
  • this application forms a keyword set based on the extracted keywords, and de-duplicates the keyword set. For example, the user searches for the Chinese Pastoral Dog in Program A, and also searches for the Chinese Pastoral Dog in Program B. Therefore, there will be cases where the keywords are the same.
  • the relevant tags are extracted from the tag set according to the keyword set to obtain the tag relationship of the user.
  • the keyword set includes Chinese Pastoral Dog, Wuxia, and Shanghai
  • the corresponding pets are extracted from the tag set: dog-loving; drama-chasing type: Wuxia; living area: first-tier developed area, so the user’s profile is established Label relationship.
  • Step 3 Perform similarity calculation according to the label relationship and the pre-built user label model to obtain a similarity set.
  • the method for calculating similarity includes: establishing a user-tag bipartite graph based on the user's tag relationship and the user tag model, and calculating user similarity and tag similarity based on the user-tag bipartite graph Degree, the user similarity and the tag similarity are formed according to the corresponding relationship between the user and the tag to obtain a similarity set.
  • the pre-built user tag model is a pre-verified and correct user-tag correspondence, for example, the tag of user A is pets: cats; drama type: urban youth; living area: second-tier area, The user A's label has been verified with the user A and confirmed to be correct.
  • the user-label two-part picture can be referred to as shown in FIG. 2.
  • One label corresponds to one or more users at the same time, and one user corresponds to one or more labels at the same time, where user a and user b are said
  • user c is the user in the tag relationship described in this application.
  • the user similarity is:
  • S m+1 (u, u') represents the similarity between the user and the user in the user tag model
  • u is the interaction data of the user
  • m Is the number of iterations
  • Trust(u,u') is the trust degree of u,u'
  • O(u) represents the set of user tags
  • O(u') represents the set of tags of user u'in the user label model
  • S m+1 (O i (u), O j (u')) represents the similarity between the label i of the user and the label j of the user u'in the user label model
  • C 1 is between [ Constant between 0,1].
  • S m+1 (t, t') represents the tag similarity between the tag t of the user and the tag t'in the user tag model
  • I(t) represents the similar tag set of the tag t
  • I (t') represents the set of similar tags of the tag t'
  • S m+1 (I i (t),I j (t')) represents the set of similar tags i of the tag t and the set of similar tags t'
  • the similarity of similar label set j, C 2 is a constant between [0,1].
  • Step 4 Calculate the tag scores in the tag relationship according to the similarity set, and perform tag sorting according to the tag scores to obtain a tag ranking set.
  • the method for calculating the tag score is:
  • r S(u,t) represents the tag score
  • S(u,t) represents the similarity set
  • u is the user’s interaction data
  • t is the user’s tag
  • u' is the Interaction data in the user tag model
  • It represents the similarity between the user of the user tag model and the user
  • ru,t represents the filter value of the user and the user's tag.
  • the And said ru,t can be solved by SimRank series algorithm based on collaborative filtering algorithm and Markov chain series algorithm.
  • Step 5 Select tags from the tag sorting set according to the preset number of tags to obtain the tag analysis result of the user, and output the tag analysis result.
  • the final tag score of the user is: Tag A, 75 points; Tag B, 93 points; Tag C, 61 points; Tag D, 32 points; Tag E, 88 points. If the preset number of tags is 3, tag B is extracted, and tag E and tag A are referred to as the user's tag analysis result.
  • the label analysis program based on user behavior may also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (this The embodiment is executed by the processor 12) to complete the present invention.
  • the module referred to in the present invention refers to a series of computer program instruction segments that can complete specific functions, and is used to describe the execution of a label analysis program based on user behavior in an electronic device. process.
  • the label analysis device based on user behavior includes a data receiving and processing module 10 and a label relationship establishment
  • the module 20, the similarity calculation and tag sorting module 30, and the tag analysis result output module 40 are exemplary:
  • the data receiving and processing module 10 is configured to receive a pre-built tag set, collect a user's original interaction data set, and preprocess the original interaction data set to obtain a standard interaction data set.
  • the tag relationship establishing module 20 is configured to establish the tag relationship of the user according to the standard interaction data set and the tag set.
  • the similarity calculation and tag ranking module 30 is configured to: perform similarity calculations based on the tag relationship and a pre-built user tag model to obtain a similarity set, and calculate the tag score in the tag relationship according to the similarity set; And perform label sorting according to the label score to obtain a label sorted set.
  • the tag analysis result output module 40 is configured to select tags from the tag sorting set according to the preset number of tags to obtain the tag analysis result of the user, and output the tag analysis result.
  • tag relationship establishing module 20 tag relationship establishing module 20
  • similarity calculation and tag sorting module 30 tag analysis result output module 40
  • other program modules that implement functions or operation steps are substantially the same as those in the foregoing embodiment when executed. This will not be repeated here.
  • an embodiment of the present application also proposes a computer-readable storage medium, the computer-readable storage medium stores a label analysis program based on user behavior, and the label analysis program based on user behavior can be processed by one or more Executed to achieve the following operations:
  • the pre-built tag set is received, the user's original interaction data set is collected, and the original interaction data set is preprocessed to obtain a standard interaction data set.
  • the computer-readable storage medium may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于用户行为的标签分析方法,包括:接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理得到标准交互数据集(S1);根据所述标准交互数据集与所述标签集建立用户的标签关系(S2);根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合(S3);根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集(S4);根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果(S5)。还提出一种基于用户行为的标签分析装置以及一种计算机可读存储介质。可以实现快速的基于用户行为的标签分析功能。

Description

标签分析方法、装置及计算机可读存储介质
本申请要求于2019年10月12日提交中国专利局、申请号为201910975812.0、发明名称为“标签分析方法、装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于用户行为的标签分析方法、装置及计算机可读存储介质。
背景技术
基于用户行为的标签分析作为一种提供决策支持的技术工具,己经应用到社会的各个方面,在很多重要的行业,例如零售、金融和电信等行业,基于用户行为的标签分析都有所应用。发明人意识到,目前主流的基于用户行为的标签分析主要基于协同过滤推荐算法,虽然对于用户标签的分析比较准确,但是由于计算强度大、数据采集众多,所以浪费了大量的计算资源,而且可扩展性也有待提高,因此急需提供一种计算简便、扩展性高的基于用户行为的标签分析方法。
发明内容
本申请提供一种基于用户行为的标签分析方法,包括:
接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集;
根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
此外,本申请还提供一种电子设备,该电子设备包括存储器和处理器,所述存储器中存储有可在所述处理器上运行的基于用户行为的标签分析程序,所述基于用户行为的标签分析程序被所述处理器执行时实现如下步骤:
接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集;
根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
此外,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有基于用户行为的标签分析程序,所述基于用户行为的标签分析程序可被一个或者多个处理器执行,以实现如下所述的基于用户行为的标签分析方法的步骤:
接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预 处理后得到标准交互数据集;
根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
此外,本申请还提供一种基于用户行为的标签分析装置,所述装置包括:
数据接收及处理模块,用于接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
标签关系建立模块,用于根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
相似度计算及标签排序模块,用于根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
标签分析结果输出模块,用于根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
附图说明
图1为本申请一实施例提供的基于用户行为的标签分析方法的流程示意图;
图2为一实施例中用户-标签的示意图;
图3为本申请一实施例提供的电子设备的内部结构示意图;
图4为本申请一实施例提供的基于用户行为的标签分析装置的模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种基于用户行为的标签分析方法。参照图1所示,为本申请一实施例提供的基于用户行为的标签分析方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,基于用户行为的标签分析方法包括:
S1、接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
优选地,所述标签集是预先构建的包括多种类型的标签,如年龄类型:70后、80后、90后等;宠物:不喜欢宠物、爱狗、爱猫等;追剧类型:武侠、都市青春、恐怖等;游戏类型:不玩游戏、角色扮演、格斗、射击类等;音乐类型:DJ、民谣、流行等;所居住地区:一线发达地区、二线地区、欠发展地区等。
较佳地,所述原始交互数据集包括常规交互数据和请求交互数据。所述常规交互数据包括采集用户的地理位置、用户平台操作系统,用户平台版本,用户应用进程或网页等启动哪些等,如用户在手机上进行一系列操作时,所述常规交互数据包括手机系统(如IOS、MIUI、Flyme等)以及所述手机系统的版本、用户的应用进程启动了如微信、QQ、淘宝、京东、网易云音乐等。
进一步地,所述请求交互数据是用户通过所述平台操作系统、所述应用进程进行请求操作的一系列数据,包括所述应用程序(网页)的启动次数、用户登录次数、用户在所述应用程序(网页)等的搜索内容,浏览内容等。如用户启动购物网站,并基于购物网站搜索了金庸武侠小说,则所述基于购物网站搜索了金庸武侠小说则为所述请求交互数据。
较佳地,为了节约计算资源,所述常规交互数据不需要频繁采集,可采用定时采集方法,如预设每12个小时采集一次所述常规交互数据。所述请求交互数据可进行实时监控的方法。
进一步地,所述交互数据集的采集方法可基于预先构建的代码内嵌入所述用户平台操作系统中,如在安卓平台操作系统预先调用所述安卓平台操作系统自带的Alarm Manager方法进行采集等。
优选地,所述预处理是为了清理在所述采集过程中出现的异常原始交互数据,包括空白数据、乱码等,如采集所述基于购物网站搜索了金庸武侠小说是乱码,则所述预处理可进行转码为正确格式或剔除等方式。
S2、根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
优选地,所述建立所述用户的标签关系,包括:提取出所述标准交互数据集的关键字并进行去重处理后得到关键字集,根据所述关键字集从所述标签集提取出相关标签得到所述用户的标签关系。
所述标准交互数据集包括了用户地理位置、用户平台操作系统、应用进程等常规交互数据,同时包括各类请求交互数据。本申请从所述常规交互数据中提取用户地理位置、用户平台操作系统、用户常用应用进程等。本申请较佳实施例根据预设使用次数阈值筛选所述用户常用应用进程。如在一周时间内,所述应用进程被打开的次数是否超过所述预设使用次数阈值,若所述应用进程超过所述预设使用次数阈值,则将所述应用进程设定为所述用户常用应用进程。进一步地,本申请从所述请求交互数据提取出用户常用搜索、浏览的关键词,如用户经常搜索中华田园犬、观看武侠类型电影电视剧等,因此可以提取出中华田园犬、武侠等关键字。
进一步地,本申请基于提取出的关键字组成关键字集,对所述关键字集进行去重处理,如用户在程序A中搜索了中华田园犬,在程序B中也搜索了中华田园犬,因此会出现关键字相同的情况。
优选地,所述根据所述关键字集从所述标签集提取出相关标签得到用户的标签关系。如所述关键字集包括中华田园犬、武侠、上海,则从所述标签集中对应提取出宠物:爱狗;追剧类型:武侠;所居住地区:一线发达地区,因此建立了所述用户的标签关系。
S3、根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合。
优选地,所述相似度计算的方法包括:根据所述用户的标签关系与所述用户标签模型建立用户-标签二部图,并根据所述用户-标签二部图计算用户相似度和标签相似度,将所述用户相似度与所述标签相似度按照用户与标签的对应关系组建得到相似度集合。
较佳地,所述预先构建的用户标签模型是预先经过验证而无误的用户与标签对应关系,如用户A的标签为宠物:爱猫;追剧类型:都市青春;所居住地区:二线地区,所述用户A的标签已经与用户A实证后确认无误的。
优选地,所述用户-标签二部图可以参照附图2所示,一个标签会同时对应一个或者多个用户,一个用户会同时对应一个或者多个标签,其中用户a、用户b为所述预先构建的用户标签模型中的用户,用户c为本申请所述标签关系中的用户。
较佳地,所述用户相似度为:
Figure PCTCN2020112333-appb-000001
其中,S m+1(u,u’)表示所述用户与所述用户标签模型中的用户相似度,u为所述用户的交互数据,u’所述用户标签模型中的交互数据,m为迭代次数,Trust(u,u’)为u,u’的信任度,O(u)表示所述用户标签集合,O(u’)表示所述用户标签模型中的用户u’的标签集合,S m+1(O i(u),O j(u’))表示所述用户的标签i与所述用户标签模型中的用户u’的标签j的相似度,C 1为介于[0,1]之间的常数。
进一步地,所述标签相似度为:
Figure PCTCN2020112333-appb-000002
其中,S m+1(t,t’)表示所述用户的标签t与所述用户标签模型中的标签t’的标签相似度,I(t)表示所述标签t的相似标签集合,I(t’)表示所述标签t’的相似标签集合,S m+1(I i(t),I j(t’))表示所述标签t的相似标签集合i与所述标签t’的相似标签集合j的相似度,C 2为介于[0,1]之间的常数。
S4、根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
较佳地,所述标签得分的计算方法为:
Figure PCTCN2020112333-appb-000003
其中,r S(u,t)表示所述标签得分,S(u,t)表示所述相似度集合,u为所述用户的交互数据,t的所述用户的标签,u’为所述用户标签模型中的交互数据,
Figure PCTCN2020112333-appb-000004
为所述用户标签模型的用户总量,
Figure PCTCN2020112333-appb-000005
为所述用户标签模型的标签总量,
Figure PCTCN2020112333-appb-000006
表示所述用户标签模型的用户与所述用户的相似度,r u,t表示所述用户与所述用户的标签的过滤值。
较佳地,所述
Figure PCTCN2020112333-appb-000007
和所述r u,t可采用基于协同过滤算法的SimRank系列算法和马尔科夫链系列算法求解。
S5、根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
优选地,如所述用户最终的标签得分为:标签A,75分;标签B,93分;标签C,61分;标签D,32分;标签E,88分。若所述预设标签个数为3,则提取出标签B,标签E和标签A称为所述用户的标签分析结果。
发明还提供一种基于用户行为的标签分析装置。参照图3所示,为本申请一实施例提供的电子设备的内部结构示意图。
在本实施例中,所述电子设备1可以是PC(Personal Computer,个人电脑),或者是智能手机、平板电脑、便携计算机等终端设备,也可以是一种服务器等。该电子设备1至少包括存储器11、处理器12,通信总线13,以及网络接口14。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的硬盘。存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如基于用户行为的标签分析程序01的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于用户行为的标签分析程序01等。
通信总线13用于实现这些组件之间的连接通信。
网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置1与其他电子设备之间建立通信连接。
可选地,该装置1还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
图3仅示出了具有组件11-14以及基于用户行为的标签分析程序01的电子设备1,本领域技术人员可以理解的是,图1示出的结构并不构成对电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
在图3所示的装置1实施例中,存储器11中存储有基于用户行为的标签分析程序01;处理器12执行存储器11中存储的基于用户行为的标签分析程序01时实现如下步骤:
步骤一、接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
优选地,所述标签集是预先构建的包括多种类型的标签,如年龄类型:70后、80后、90后等;宠物:不喜欢宠物、爱狗、爱猫等;追剧类型:武侠、都市青春、恐怖等;游戏类型:不玩游戏、角色扮演、格斗、射击类等;音乐类型:DJ、民谣、流行等;所居住地区:一线发达地区、二线地区、欠发展地区等。
较佳地,所述原始交互数据集包括常规交互数据和请求交互数据。所述常规交互数据包括采集用户的地理位置、用户平台操作系统,用户平台版本,用户应用进程或网页等启动哪些等,如用户在手机上进行一系列操作时,所述常规交互数据包括手机系统(如IOS、MIUI、Flyme等)以及所述手机系统的版本、用户的应用进程启动了如微信、QQ、淘宝、京东、网易云音乐等。
进一步地,所述请求交互数据是用户通过所述平台操作系统、所述应用进程进行请求操作的一系列数据,包括所述应用程序(网页)的启动次数、用户登录次数、用户在所述应用程序(网页)等的搜索内容,浏览内容等。如用户启动购物网站,并基于购物网站搜索了金庸武侠小说,则所述基于购物网站搜索了金庸武侠小说则为所述请求交互数据。
较佳地,为了节约计算资源,所述常规交互数据不需要频繁采集,可采用定时采集方法,如预设每12个小时采集一次所述常规交互数据。所述请求交互数据可进行实时监控的方法。
进一步地,所述交互数据集的采集方法可基于预先构建的代码内嵌入所述用户平台操作系统中,如在安卓平台操作系统预先调用所述安卓平台操作系统自带的Alarm Manager方法进行采集等。
优选地,所述预处理是为了清理在所述采集过程中出现的异常原始交互数据,包括空白数据、乱码等,如采集所述基于购物网站搜索了金庸武侠小说是乱码,则所述预处理可进行转码为正确格式或剔除等方式。
步骤二、根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
优选地,所述建立所述用户的标签关系,包括:提取出所述标准交互数据集的关键字并进行去重处理后得到关键字集,根据所述关键字集从所述标签集提取出相关标签得到所述用户的标签关系。
所述标准交互数据集包括了用户地理位置、用户平台操作系统、应用进程等常规交互数据,同时包括各类请求交互数据。本申请从所述常规交互数据中提取用户地理位置、用户平台操作系统、用户常用应用进程等。本申请较佳实施例根据预设使用次数阈值筛选所述用户常用应用进程。如在一周时间内,所述应用进程被打开的次数是否超过所述预设使用次数阈值,若所述应用进程超过所述预设使用次数阈值,则将所述应用进程设定为所述用户常用应用进程。进一步地,本申请从所述请求交互数据提取出用户常用搜索、浏览的 关键词,如用户经常搜索中华田园犬、观看武侠类型电影电视剧等,因此可以提取出中华田园犬、武侠等关键字。
进一步地,本申请基于提取出的关键字组成关键字集,对所述关键字集进行去重处理,如用户在程序A中搜索了中华田园犬,在程序B中也搜索了中华田园犬,因此会出现关键字相同的情况。
优选地,所述根据所述关键字集从所述标签集提取出相关标签得到用户的标签关系。如所述关键字集包括中华田园犬、武侠、上海,则从所述标签集中对应提取出宠物:爱狗;追剧类型:武侠;所居住地区:一线发达地区,因此建立了所述用户的标签关系。
步骤三、根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合。
优选地,所述相似度计算的方法包括:根据所述用户的标签关系与所述用户标签模型建立用户-标签二部图,并根据所述用户-标签二部图计算用户相似度和标签相似度,将所述用户相似度与所述标签相似度按照用户与标签的对应关系组建得到相似度集合。
较佳地,所述预先构建的用户标签模型是预先经过验证而无误的用户与标签对应关系,如用户A的标签为宠物:爱猫;追剧类型:都市青春;所居住地区:二线地区,所述用户A的标签已经与用户A实证后确认无误的。
优选地,所述用户-标签二部图可以参照附图2所示,一个标签会同时对应一个或者多个用户,一个用户会同时对应一个或者多个标签,其中用户a、用户b为所述预先构建的用户标签模型中的用户,用户c为本申请所述标签关系中的用户。
较佳地,所述用户相似度为:
Figure PCTCN2020112333-appb-000008
其中,S m+1(u,u’)表示所述用户与所述用户标签模型中的用户相似度,u为所述用户的交互数据,u’所述用户标签模型中的交互数据,m为迭代次数,Trust(u,u’)为u,u’的信任度,O(u)表示所述用户标签集合,O(u’)表示所述用户标签模型中的用户u’的标签集合,S m+1(O i(u),O j(u’))表示所述用户的标签i与所述用户标签模型中的用户u’的标签j的相似度,C 1为介于[0,1]之间的常数。
进一步地,所述标签相似度为:
Figure PCTCN2020112333-appb-000009
其中,S m+1(t,t’)表示所述用户的标签t与所述用户标签模型中的标签t’的标签相似度,I(t)表示所述标签t的相似标签集合,I(t’)表示所述标签t’的相似标签集合,S m+1(I i(t),I j(t’))表示所述标签t的相似标签集合i与所述标签t’的相似标签集合j的相似度,C 2为介于[0,1]之间的常数。
步骤四、根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
较佳地,所述标签得分的计算方法为:
Figure PCTCN2020112333-appb-000010
其中,r S(u,t)表示所述标签得分,S(u,t)表示所述相似度集合,u为所述用户的交互数据,t的所述用户的标签,u’为所述用户标签模型中的交互数据,
Figure PCTCN2020112333-appb-000011
为所述用户标签模型的用户总量,
Figure PCTCN2020112333-appb-000012
为所述用户标签模型的标签总量,
Figure PCTCN2020112333-appb-000013
表示所述用户标签模型的用户与所述用户的相似度,r u,t表示所述用户与所述用户的标签的过滤值。
较佳地,所述
Figure PCTCN2020112333-appb-000014
和所述r u,t可采用基于协同过滤算法的SimRank系列算法和马尔科夫链系列算法求解。
步骤五、根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
优选地,如所述用户最终的标签得分为:标签A,75分;标签B,93分;标签C,61分;标签D,32分;标签E,88分。若所述预设标签个数为3,则提取出标签B,标签E和标签A称为所述用户的标签分析结果。
可选地,在其他实施例中,基于用户行为的标签分析程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本发明,本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段,用于描述基于用户行为的标签分析程序在电子设备中的执行过程。
参照图4所示,为本申请基于用户行为的标签分析装置一实施例中的程序模块示意图,该实施例中,所述基于用户行为的标签分析装置包括数据接收及处理模块10、标签关系建立模块20、相似度计算及标签排序模块30、标签分析结果输出模块40示例性地:
所述数据接收及处理模块10用于:接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
所述标签关系建立模块20用于:根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
所述相似度计算及标签排序模块30用于:根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
所述标签分析结果输出模块40用于:根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
上述数据接收及处理模块10、标签关系建立模块20、相似度计算及标签排序模块30、标签分析结果输出模块40等程序模块被执行时所实现的功能或操作步骤与上述实施例大体相同,在此不再赘述。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有基于用户行为的标签分析程序,所述基于用户行为的标签分析程序可被一个或多个处理器执行,以实现如下操作:
接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
所述计算机可读存储介质可以是非易失性,也可以是易失性。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、 物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于用户行为的标签分析方法,其中,所述方法包括:
    接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集;
    根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
    根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
    根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
    根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
  2. 如权利要求1所述的基于用户行为的标签分析方法,其中,所述根据所述标准交互数据集与所述标签集建立所述待标签分析用户的标签关系,包括:
    提取出所述标准交互数据集的关键字并进行关键字去重处理后得到关键字集;
    根据所述关键字集从所述标签集提取出与所述关键字集相关的标签得到所述用户的标签关系。
  3. 如权利要求1或2所述的基于用户行为的标签分析方法,其中,所述根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,包括:
    根据所述标签关系与所述预先构建的用户标签模型建立用户-标签二部图;
    根据所述用户-标签二部图计算用户相似度及标签相似度;
    将所述用户相似度与所述标签相似度按照用户与标签的对应关系计算得到所述相似度集合。
  4. 如权利要求3所述的基于用户行为的标签分析方法,其中,所述用户相似度的计算方法为:
    Figure PCTCN2020112333-appb-100001
    其中,S m+1(u,u ,)表示所述用户相似度,u为所述用户的交互数据,u ,为所述用户标签模型中的交互数据,m为迭代次数,Trust(u,u ,)为u,u ,的信任度,O(u)表示所述待标签分析用户标签集合,O(u ,)表示所述用户标签模型中的用户u ,的标签集合,S m+1(O i(u),O j(u ,))表示所述待标签分析用户的标签i与所述用户标签模型中的用户u ,的标签j的相似度,C 1为介于[0,1]之间的常数;
    所述标签相似度为:
    Figure PCTCN2020112333-appb-100002
    其中,S m+1(t,t ,)表示所述待标签分析用户的标签t与所述用户标签模型中的标签t ,的标签相似度,I(t)表示所述标签t的相似标签集合,I(t ,)表示所述标签t ,的相似标签集合,S m+1(I i(t),I j(t ,))表示所述标签t的相似标签集合i与所述标签t ,的相似标签集合j的相似度,C 2为介于[0,1]之间的常数。
  5. 如权利要求1所述的基于用户行为的标签分析方法,其中,所述根据所述相似度集合计算所述标签关系中的标签得分的方法为:
    Figure PCTCN2020112333-appb-100003
    其中,r S(u,t)表示所述标签得分,S(u,t)表示所述相似度集合,u为所述用户的交互数据,t的所述用户的标签,u ,为所述用户标签模型中的交互数据,
    Figure PCTCN2020112333-appb-100004
    为所述用户标签模型的用 户总量,
    Figure PCTCN2020112333-appb-100005
    为所述用户标签模型的标签总量,
    Figure PCTCN2020112333-appb-100006
    表示所述用户标签模型的用户与所述待标签分析用户的相似度,r u,t表示所述待标签分析用户与所述待标签分析用户的标签的过滤值。
  6. 如权利要求1所述的基于用户行为的标签分析方法,其中,所述原始交互数据集包括常规交互数据和请求交互数据,其中,所述常规交互数据采用定时采集方法获取,所述请求交互数据采用实时监控方法获取。
  7. 如权利要求1所述的基于用户行为的标签分析方法,其中,所述交互数据集的采集方法基于预先构建的代码内嵌入所述用户平台操作系统中。
  8. 一种电子设备,其中,所述电子设备包括存储器和处理器,所述存储器上存储有可在所述处理器上运行的基于用户行为的标签分析程序,所述基于用户行为的标签分析程序被所述处理器执行时实现如下步骤:
    接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集;
    根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
    根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
    根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
    根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
  9. 如权利要求8所述的电子设备,其中,所述根据所述标准交互数据集与所述标签集建立所述待标签分析用户的标签关系,包括:
    提取出所述标准交互数据集的关键字并进行关键字去重处理后得到关键字集;
    根据所述关键字集从所述标签集提取出与所述关键字集相关的标签得到所述用户的标签关系。
  10. 如权利要求8或9所述的电子设备,其中,所述根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,包括:
    根据所述标签关系与所述预先构建的用户标签模型建立用户-标签二部图;
    根据所述用户-标签二部图计算用户相似度及标签相似度;
    将所述用户相似度与所述标签相似度按照用户与标签的对应关系计算得到所述相似度集合。
  11. 如权利要求10中所述的电子设备,其中,所述用户相似度的计算方法为:
    Figure PCTCN2020112333-appb-100007
    其中,S m+1(u,u ,)表示所述用户相似度,u为所述用户的交互数据,u ,为所述用户标签模型中的交互数据,m为迭代次数,Trust(u,u ,)为u,u ,的信任度,O(u)表示所述待标签分析用户标签集合,O(u ,)表示所述用户标签模型中的用户u ,的标签集合,S m+1(O i(u),O j(u ,))表示所述待标签分析用户的标签i与所述用户标签模型中的用户u ,的标签j的相似度,C 1为介于[0,1]之间的常数;
    所述标签相似度为:
    Figure PCTCN2020112333-appb-100008
    其中,S m+1(t,t ,)表示所述待标签分析用户的标签t与所述用户标签模型中的标签t ,的标签相似度,I(t)表示所述标签t的相似标签集合,I(t ,)表示所述标签t ,的相似标签集合,S m+1(I i(t),I j(t ,))表示所述标签t的相似标签集合i与所述标签t ,的相似标签集合j的相似度,C 2为介于[0,1]之间的常数。
  12. 如权利要求8所述的电子设备,其中,所述根据所述相似度集合计算所述标签关系中的标签得分的方法为:
    Figure PCTCN2020112333-appb-100009
    其中,r S(u,t)表示所述标签得分,S(u,t)表示所述相似度集合,u为所述用户的交互数据,t的所述用户的标签,u ,为所述用户标签模型中的交互数据,
    Figure PCTCN2020112333-appb-100010
    为所述用户标签模型的用户总量,
    Figure PCTCN2020112333-appb-100011
    为所述用户标签模型的标签总量,
    Figure PCTCN2020112333-appb-100012
    表示所述用户标签模型的用户与所述待标签分析用户的相似度,r u,t表示所述待标签分析用户与所述待标签分析用户的标签的过滤值。
  13. 如权利要求8所述的电子设备,其中,所述原始交互数据集包括常规交互数据和请求交互数据,其中,所述常规交互数据采用定时采集方法获取,所述请求交互数据采用实时监控方法获取。
  14. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有基于用户行为的标签分析程序,所述基于用户行为的标签分析程序可被一个或者多个处理器执行,以实现如下所述的基于用户行为的标签分析方法的步骤:
    接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集;
    根据所述标准交互数据集与所述标签集建立所述用户的标签关系;
    根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合;
    根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集;
    根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
  15. 如权利要求14所述的计算机可读存储介质,其中,所述根据所述标准交互数据集与所述标签集建立所述待标签分析用户的标签关系,包括:
    提取出所述标准交互数据集的关键字并进行关键字去重处理后得到关键字集;
    根据所述关键字集从所述标签集提取出与所述关键字集相关的标签得到所述用户的标签关系。
  16. 如权利要求14或15所述的计算机可读存储介质,其中,所述根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,包括:
    根据所述标签关系与所述预先构建的用户标签模型建立用户-标签二部图;
    根据所述用户-标签二部图计算用户相似度及标签相似度;
    将所述用户相似度与所述标签相似度按照用户与标签的对应关系计算得到所述相似度集合。
  17. 如权利要求16中所述的计算机可读存储介质,其中,所述用户相似度的计算方法为:
    Figure PCTCN2020112333-appb-100013
    其中,S m+1(u,u ,)表示所述用户相似度,u为所述用户的交互数据,u ,为所述用户标签模型中的交互数据,m为迭代次数,Trust(u,u ,)为u,u ,的信任度,O(u)表示所述待标签分析用户标签集合,O(u ,)表示所述用户标签模型中的用户u ,的标签集合,S m+1(O i(u),O j(u ,))表示所述待标签分析用户的标签i与所述用户标签模型中的用户u ,的标签j的相似度,C 1为介于[0,1]之间的常数;
    所述标签相似度为:
    Figure PCTCN2020112333-appb-100014
    其中,S m+1(t,t ,)表示所述待标签分析用户的标签t与所述用户标签模型中的标签t ,的标签相似度,I(t)表示所述标签t的相似标签集合,I(t ,)表示所述标签t ,的相似标签集合,S m+1(I i(t),I j(t ,))表示所述标签t的相似标签集合i与所述标签t ,的相似标签集合j的相似度,C 2为介于[0,1]之间的常数。
  18. 如权利要求14所述的计算机可读存储介质,其中,所述根据所述相似度集合计算所述标签关系中的标签得分的方法为:
    Figure PCTCN2020112333-appb-100015
    其中,r S(u,t)表示所述标签得分,S(u,t)表示所述相似度集合,u为所述用户的交互数据,t的所述用户的标签,u ,为所述用户标签模型中的交互数据,
    Figure PCTCN2020112333-appb-100016
    为所述用户标签模型的用户总量,
    Figure PCTCN2020112333-appb-100017
    为所述用户标签模型的标签总量,
    Figure PCTCN2020112333-appb-100018
    表示所述用户标签模型的用户与所述待标签分析用户的相似度,r u,t表示所述待标签分析用户与所述待标签分析用户的标签的过滤值。
  19. 如权利要求14所述的计算机可读存储介质,其中,所述原始交互数据集包括常规交互数据和请求交互数据,其中,所述常规交互数据采用定时采集方法获取,所述请求交互数据采用实时监控方法获取。
  20. 一种基于用户行为的标签分析装置,其中,所述装置包括:
    数据接收及处理模块,用于接收预先构建的标签集,采集用户的原始交互数据集,将所述原始交互数据集进行预处理后得到标准交互数据集。
    标签关系建立模块,用于根据所述标准交互数据集与所述标签集建立所述用户的标签关系。
    相似度计算及标签排序模块,用于根据所述标签关系与预先构建的用户标签模型进行相似度计算得到相似度集合,根据所述相似度集合计算所述标签关系中的标签得分,并根据所述标签得分进行标签排序得到标签排序集。
    标签分析结果输出模块,用于根据预设标签个数从所述标签排序集中选择标签得到所述用户的标签分析结果,并输出所述标签分析结果。
PCT/CN2020/112333 2019-10-12 2020-08-30 标签分析方法、装置及计算机可读存储介质 WO2021068681A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910975812.0A CN110889045B (zh) 2019-10-12 2019-10-12 标签分析方法、装置及计算机可读存储介质
CN201910975812.0 2019-10-12

Publications (1)

Publication Number Publication Date
WO2021068681A1 true WO2021068681A1 (zh) 2021-04-15

Family

ID=69746182

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112333 WO2021068681A1 (zh) 2019-10-12 2020-08-30 标签分析方法、装置及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN110889045B (zh)
WO (1) WO2021068681A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282807A (zh) * 2021-06-29 2021-08-20 中国平安人寿保险股份有限公司 基于二部图的关键词扩展方法、装置、设备及介质
CN117333203A (zh) * 2023-12-01 2024-01-02 广东付惠吧数据服务有限公司 一种结合商业营销解决方案的会员营销平台

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889045B (zh) * 2019-10-12 2024-04-23 平安科技(深圳)有限公司 标签分析方法、装置及计算机可读存储介质
CN111666401B (zh) * 2020-05-29 2023-06-30 平安科技(深圳)有限公司 基于图结构的公文推荐方法、装置、计算机设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010709A1 (en) * 2000-02-22 2002-01-24 Culbert Daniel Jason Method and system for distilling content
US20140156582A1 (en) * 2012-11-30 2014-06-05 Jayson Holliewood Cornelius Item Response Methods as Applied to a Dynamic Content Distribution System and Methods
CN109992723A (zh) * 2019-02-25 2019-07-09 平安科技(深圳)有限公司 一种基于社交网络的用户兴趣标签构建方法及相关设备
CN110097395A (zh) * 2019-03-27 2019-08-06 平安科技(深圳)有限公司 定向广告投放方法、装置及计算机可读存储介质
CN110889045A (zh) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 标签分析方法、装置及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9465828B2 (en) * 2013-01-22 2016-10-11 Salesforce.Com, Inc. Computer implemented methods and apparatus for identifying similar labels using collaborative filtering
CN109117442B (zh) * 2017-06-23 2023-03-24 腾讯科技(深圳)有限公司 一种应用推荐方法及装置
CN108304435B (zh) * 2017-09-08 2020-08-25 腾讯科技(深圳)有限公司 信息推荐方法、装置、计算机设备及存储介质
CN108256067B (zh) * 2018-01-16 2021-07-06 平安好房(上海)电子商务有限公司 计算房源相似度的方法、装置、设备及存储介质
CN109408734B (zh) * 2018-09-28 2021-07-27 嘉兴学院 一种融合信息熵相似度与动态信任的协同过滤推荐方法
CN109697629B (zh) * 2018-11-15 2023-02-24 平安科技(深圳)有限公司 产品数据推送方法及装置、存储介质、计算机设备
CN110287423A (zh) * 2019-05-05 2019-09-27 江苏一乙生态农业科技有限公司 一种基于协同过滤的农场产品推荐系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020010709A1 (en) * 2000-02-22 2002-01-24 Culbert Daniel Jason Method and system for distilling content
US20140156582A1 (en) * 2012-11-30 2014-06-05 Jayson Holliewood Cornelius Item Response Methods as Applied to a Dynamic Content Distribution System and Methods
CN109992723A (zh) * 2019-02-25 2019-07-09 平安科技(深圳)有限公司 一种基于社交网络的用户兴趣标签构建方法及相关设备
CN110097395A (zh) * 2019-03-27 2019-08-06 平安科技(深圳)有限公司 定向广告投放方法、装置及计算机可读存储介质
CN110889045A (zh) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 标签分析方法、装置及计算机可读存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282807A (zh) * 2021-06-29 2021-08-20 中国平安人寿保险股份有限公司 基于二部图的关键词扩展方法、装置、设备及介质
CN113282807B (zh) * 2021-06-29 2022-09-02 中国平安人寿保险股份有限公司 基于二部图的关键词扩展方法、装置、设备及介质
CN117333203A (zh) * 2023-12-01 2024-01-02 广东付惠吧数据服务有限公司 一种结合商业营销解决方案的会员营销平台
CN117333203B (zh) * 2023-12-01 2024-04-16 广东付惠吧数据服务有限公司 一种结合商业营销解决方案的会员营销平台

Also Published As

Publication number Publication date
CN110889045B (zh) 2024-04-23
CN110889045A (zh) 2020-03-17

Similar Documents

Publication Publication Date Title
WO2021068681A1 (zh) 标签分析方法、装置及计算机可读存储介质
WO2019218514A1 (zh) 网页目标信息的提取方法、装置及存储介质
US10380197B2 (en) Network searching method and network searching system
WO2019085355A1 (zh) 互联网新闻的舆情聚类分析方法、应用服务器及计算机可读存储介质
WO2019085335A1 (zh) 利用新词发现投资标的的方法、装置及存储介质
WO2020237856A1 (zh) 基于知识图谱的智能问答方法、装置及计算机存储介质
WO2019041521A1 (zh) 用户关键词提取装置、方法及计算机可读存储介质
KR20180085756A (ko) 오더 클러스터링 및 악의적인 정보 퇴치 방법 및 장치
WO2014071787A1 (zh) 检索应用的方法、装置及终端
CN113822067A (zh) 关键信息提取方法、装置、计算机设备及存储介质
EP3311309A1 (en) Methods and systems for object recognition
CA2919878C (en) Refining search query results
US8645363B2 (en) Spreading comments to other documents
CN111178950A (zh) 一种用户画像构建方法、装置及计算设备
WO2018095411A1 (zh) 一种网页聚类方法及装置
WO2014029173A1 (zh) 一种用于对搜索结果进行排序的方法、装置与设备
WO2020056977A1 (zh) 知识点推送方法、装置及计算机可读存储介质
US10296540B1 (en) Determine image relevance using historical action data
CN110738049A (zh) 相似文本的处理方法、装置及计算机可读存储介质
WO2015024522A1 (zh) 搜索方法、系统、搜索引擎和客户端
CN112818200A (zh) 基于静态网站的数据爬取及事件分析方法及系统
CN111125485A (zh) 基于Scrapy的网站URL爬取方法
WO2019227705A1 (zh) 图片录入方法、服务器及计算机存储介质
WO2015074455A1 (zh) 一种计算关联网页URL模式pattern的方法和装置
US20120284224A1 (en) Build of website knowledge tables

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20875246

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20875246

Country of ref document: EP

Kind code of ref document: A1