WO2021159448A1 - 一种基于序列推荐系统的通用网络压缩框架和压缩方法 - Google Patents

一种基于序列推荐系统的通用网络压缩框架和压缩方法 Download PDF

Info

Publication number
WO2021159448A1
WO2021159448A1 PCT/CN2020/075220 CN2020075220W WO2021159448A1 WO 2021159448 A1 WO2021159448 A1 WO 2021159448A1 CN 2020075220 W CN2020075220 W CN 2020075220W WO 2021159448 A1 WO2021159448 A1 WO 2021159448A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
block
cluster
embedding
blocks
Prior art date
Application number
PCT/CN2020/075220
Other languages
English (en)
French (fr)
Inventor
杨敏
原发杰
孙洋
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2020/075220 priority Critical patent/WO2021159448A1/zh
Publication of WO2021159448A1 publication Critical patent/WO2021159448A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of sequence recommendation, and more specifically, to a general network compression framework and compression method based on a sequence recommendation system.
  • Sequence also called session-based recommendation system has become a research hotspot in the recommendation field. This is because user interactions in real life usually exist in the form of time series. For example, after purchasing a phone on Amazon, the user is likely to buy a mobile phone case, headset, and screen protector during the conversation. Another example comes from the popular short video sharing application TikTok. Users can watch hundreds of videos within an hour, and these videos naturally form a video playback sequence. In this case, sequence recommendation models based on Recurrent Neural Networks (RNN) or Convolutional Neural Networks (CNN) (usually using hole convolution) achieve the best recommendation performance because these deep learning models are acquiring users-recommendations The sequence dependency relationship in the item interaction sequence is more powerful. .
  • RNN Recurrent Neural Networks
  • CNN Convolutional Neural Networks
  • the modern sequence recommendation model based on deep neural network is divided into three main modules: the input embedding layer used to represent the interactive sequence, the output softmax layer used to generate the next probability distribution, and the sandwich between them.
  • One or more hidden layers cyclic layer or convolutional layer between.
  • the usual method is larger model size and more model parameters.
  • model parallelization can be applied to larger networks, the communication overhead is still proportional to the number of parameters in the model.
  • existing research shows that at a certain point, further increasing the model size may lead to overfitting problems or unexpected model performance degradation problems. Therefore, model compression is essential for realizing recommended models that can respond in real time and have better generalization capabilities.
  • the parameters from the middle layer cannot be ignored.
  • the memory consumption may be dominated by the middle layer and the embedding matrix.
  • An object of the present invention is to solve the problem of model compression in the field of sequence recommendation, and to provide a general compression framework and compression method based on a sequence recommendation system.
  • a general network compression framework based on a sequence recommendation system including:
  • Input embedding layer based on block adaptive decomposition divide the recommended item set into multiple clusters according to the frequency of the recommended items and divide the input embedding matrix into corresponding multiple blocks, where each cluster block is assigned a different dimension;
  • Intermediate layer of hierarchical parameter sharing connecting the input embedding layer, which is formed by stacking multiple residual blocks, and adopts a hierarchical parameter sharing mechanism for parameter sharing;
  • Output layer based on block adaptive decomposition use the same block embedding clustering configuration as the input embedding layer, and use a tree structure to represent the blocks of each cluster, obtain the probability distribution of the output sequence, and then recommend the desired Item for forecasting.
  • the dividing the recommended item set into multiple clusters and dividing the input embedding matrix into corresponding multiple blocks according to the frequency of the recommended items includes:
  • the input embedding layer further includes:
  • Will block Decompose into two low-order matrices in d j is the factorization dimension of the j-th block
  • the layered parameter sharing mechanism includes:
  • Cross-block parameter sharing means that all higher layers reuse the parameters of the first residual block
  • Adjacent layer parameter sharing means that two separate layers in each residual block share the same parameter set
  • Adjacent block parameter sharing means that parameters are shared between every two adjacent residual blocks.
  • the block representing each cluster in a tree structure includes:
  • each tree node represents a cluster
  • the embedding matrix of the first cluster is stored in the root node of the tree
  • the embedding matrix of other clusters is stored in the leaf nodes of the second layer of the tree
  • each recommendation item is represented as a different category, while for the other clusters, two nodes are assigned to each recommendation item, including using its cluster location as the root of the parent category of the recommendation item A node and a leaf node that represents a specific position in its cluster.
  • the output layer also executes:
  • n-1 represents the number of parent classes to which the leaf node belongs
  • the configuration is configured to include multiple intermediate layers, and each two intermediate layers use residual connection as a residual block, and an exponentially growing receptive field is obtained by multiplying the cavity factor of each layer.
  • a compression method using the universal network compression framework based on the sequence recommendation system provided by the present invention including:
  • the recommended item set is divided into multiple clusters and the input embedding matrix is divided into corresponding multiple blocks, wherein different dimensions are assigned to the blocks of each cluster;
  • the middle layer is formed by stacking multiple residual blocks, and adopts a layered parameter sharing mechanism for parameter sharing;
  • the output layer uses the same block embedding clustering configuration as the input embedding layer, and adopts a tree structure to represent the blocks of each cluster, obtains the probability distribution of the output sequence, and then predicts the expected recommendation items.
  • the advantages of the present invention are: based on the characteristics of the sequence recommendation field and using the deep learning model, a new joint compression framework for sequence recommendation is proposed. In terms of comprehensive consideration of model compression, it effectively solves the problem of the huge amount of model parameters, improves the training and inference efficiency of sequence recommendation models, and alleviates the phenomenon of model overfitting.
  • Fig. 1 is a schematic diagram of a block embedding decomposition method according to an embodiment of the present invention
  • Fig. 2 is a schematic diagram of parameter sharing across blocks and adjacent layers/blocks according to an embodiment of the present invention
  • Fig. 3 is a schematic diagram of a general network compression framework according to an embodiment of the present invention.
  • block-block layer-layer
  • two general model compression mechanisms are proposed to reduce the memory consumption of the sequence recommendation system, namely, the block adaptive decomposition method and the layered parameter sharing method.
  • a block adaptive decomposition method is proposed to obtain the block embedding matrix to approximate the original embedding matrix (input embedding matrix and output softmax matrix are also collectively referred to in the following description Is embedded matrix); introduce cross-block parameter sharing, adjacent layer parameter sharing and adjacent block parameter sharing methods to reduce the parameters of the middle layer. Since these two model compression mechanisms are orthogonal, in the embodiment of the present invention, they can be naturally combined to form a joint model compression framework to achieve a higher compression rate.
  • the proposed joint model compression framework is named CpRec or a general network compression framework based on sequence recommendation system.
  • This general network compression framework combines the characteristics of the sequence recommendation field, from the input layer, output layer and intermediate depth architecture Compress the sequence recommendation model in three aspects. Under the premise of ensuring the accuracy of the model recommendation, the efficiency of the model is improved, the phenomenon of model overfitting is alleviated, and the storage required for the model is reduced.
  • the distributed introduction of block adaptive decomposition including the input embedding layer and the output softmax layer), the middle layer of layered parameter sharing, and the CpRec provided by the present invention are described based on the NextItNet architecture.
  • the frequency distribution of recommended items obeys the long-tailed distribution. Only a few items may contain rich information due to their high frequency, while other items may only contain limited information. For example, some “head” (or popular) recommendation items have a large number of user interactions, but for “tail” recommendation items there are only a few interactions. In view of this, assigning a fixed embedding dimension to all recommended items is suboptimal and unnecessary, which may lead to poor performance. Intuitively, recommendation items with higher frequency may contain more information than rare recommendation items, so more capacity should be allocated to them during the training process. In other words, the embedding dimension of more frequent (or popular) recommendations should be larger than the embedding dimension of unpopular recommendations.
  • first rank all recommended items according to the frequency of the sequence recommendation system S ⁇ x 1 ,x 2 ,...,x K ⁇ , where x 1 and x K are the recommended items with the highest frequency and the lowest frequency, respectively.
  • the input embedding matrix E ⁇ R K ⁇ d (as shown in Figure 1(a)) can be divided into n blocks (As shown in Figure 2(b)), where d is the embedding size.
  • the output softmax matrix P ⁇ R d ⁇ K is divided into n blocks
  • the block Decompose into two low-order matrices in d j is the factorization dimension (also called rank) of the j-th block. Since high-frequency recommendation items should have higher expressive power, as the index of the cluster increases, the corresponding d j is reduced. Correspondingly, the embedding representation of each recommendation item is different from the embedding representation through the original look-up operation. Given the recommended item label IDx, the embedding vector v x ⁇ R d is expressed as:
  • the input embedding layer based on block adaptive decomposition in the embodiment of the present invention divides all recommended items into multiple clusters according to the frequency of them, and the embedding matrix of each cluster (referred to as block ) Is decomposed by two low-rank matrices, where the rank value is also determined by the frequency of recommended items in the cluster, that is, a larger rank value is assigned to a block with more frequent items.
  • block the embedding matrix of each cluster
  • different dimensions can be assigned to the blocks of each cluster, thereby greatly reducing the amount of parameters input to the embedding layer.
  • these blocks are constructed by a two-level tree, where each tree node represents a cluster.
  • Figure 2(d) is an example of outputting the block embedding of the softmax layer.
  • the embedding matrix of the first cluster (the first block) is stored in the root node of the tree, and the other blocks are stored in the leaf nodes of the second level of the tree.
  • each recommendation item is represented as a different category; for other clusters, each recommendation item is assigned two nodes: use its cluster position as the root node of the parent category of the recommendation item And represents the leaf node at a specific position in its cluster. In this way, the recommended items in the same cluster share the same parent category. More specifically, a clustering configuration of block embedding similar to the input embedding layer is used in the output softmax layer. One major difference is that the first block matrix of the output layer is expanded to Where n-1 represents the number of parent classes to which the leaf node belongs.
  • the other block matrices in the output softmax layer are in Compared with the original softmax layer, the parameter amount of the output softmax layer is reduced from O(K ⁇ d) to
  • the embedding matrix (called block) of each cluster is composed of two low-rank matrices, and the rank value is also determined by the frequency of the recommended items in the cluster ⁇ Assign higher rank values to blocks with more frequent items.
  • block adaptive decomposition method different dimensions can be allocated to the blocks of each cluster to reduce the model size.
  • the training process includes two steps.
  • the first step is to calculate the logits of the first cluster, which takes O(k 1 +n-1) time.
  • the second stage if the recommended item label x belongs to one of the clusters on the leaf node, the logits of the cluster will be calculated, which requires O(k j ) time.
  • the training time of the present invention using block embedding is reduced from O(K) to O(k 1 +n-1) and O(k 1 +k j +n-1) between.
  • h,S 1 ) can all be calculated by formula (2).
  • a recommendation item based on p(x) recommendation top-N (top N items) is obtained.
  • the probability scores of recommended items in other clusters ie, p(x
  • h,S 1 ) (where p(c(x)
  • the output softmax layer uses a block embedding clustering structure similar to the input embedding layer, including the number of blocks and the size of each block. And in the output softmax layer, a probability approximation method based on tree representation is designed to replace the original softmax, which significantly reduces the amount of parameters, training time and inference time of the output softmax layer.
  • the present invention proposes a block adaptive decomposition method for input embedding and output softmax layer to obtain block embedding matrix, and then approximate the original input and output layer embedding matrix in the sequence recommendation system, and output In the softmax layer, a probability approximation method is designed based on the tree representation to replace the original softmax.
  • This method can well reduce the amount of embedding matrix parameters in the input/output layer, and effectively improve the training and inference speed of the sequence recommendation model, thereby solving the problem of excessive recommendation items in the existing recommendation system, making the input embedding layer and output softmax.
  • the compression method proposed in the embodiment of the present invention mainly focuses on the sequence recommendation model, which has multiple intermediate layers, and each two intermediate layers are connected by residuals (see ResNet structure (He K, Zhang X, Ren S) ,et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.)), such as NextItNet.
  • ALBERT (Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations[J].arXiv preprint arXiv:1909.11942, 2019.) proposed The cross-layer parameter sharing method is shown in Figure 2(a). While reducing a large number of redundant parameters, the expression of neural network models is also restricted to a certain extent. In fact, with this cross-layer sharing scheme, the performance of the model is significantly reduced in the recommendation task. Therefore, the embodiment of the present invention proposes cross-block parameter sharing, as shown in Figure 2(b), in which all higher layers reuse the parameters of the first residual block (i.e., the two bottom layers).
  • the embodiment of the present invention also proposes two other layered parameter sharing methods: adjacent layer parameter sharing (as shown in Figure 2c) and adjacent block parameters Sharing (as shown in Figure 2(d)).
  • adjacent layer parameter sharing means that two separate layers in each residual block share the same parameter set.
  • Adjacent block parameter sharing means that parameters are shared between every two adjacent residual blocks.
  • This parameter sharing strategy has two main advantages: as a regularization method, it can stabilize the training process and improve the generalization ability of the model; it can significantly reduce the amount of parameters without degrading performance like cross-layer parameter sharing. In particular, through parameter sharing between adjacent blocks, the recommended accuracy is always better than the benchmark model.
  • the present invention proposes three different layered parameter sharing methods (ie, cross-block parameter sharing, adjacent layer parameter sharing, and adjacent block parameter sharing) to reduce redundant parameters in the middle layer.
  • three different layered parameter sharing methods ie, cross-block parameter sharing, adjacent layer parameter sharing, and adjacent block parameter sharing
  • the number of parameters is effectively limited, thereby solving the shortcoming of a large number of parameters caused by the need to build a deep architecture for extracting long-distance dependent information in the user-recommendation interaction sequence in the sequence recommendation field.
  • the present invention instantiates CpRec by using the NextItNet architecture, see the overall neural network model architecture shown in Figure 3, which is divided into three modules, which are the input embedding layer of block adaptive decomposition (corresponding to the left structural diagram in Figure 3) , The output softmax layer of block adaptive decomposition (corresponding to the structure diagram on the right in Figure 3) and the intermediate layer of layered parameter sharing (corresponding to the intermediate structure diagram in Figure 3).
  • CpRec For the middle layer, as shown in the middle part of Figure 3, refer to NextItNet to use a hole convolution layer for CpRec, where each two layers use residual connections as a residual block.
  • CpRec obtains exponentially increasing receptive fields by multiplying the void factor of each layer, such as ⁇ 1,2,4,8 ⁇ .
  • the structure can be stacked multiple times, such as ⁇ 1,2,4,8,...,1,2,4,8 ⁇ . Then, the proposed hierarchical parameter sharing strategy can be applied to these middle layers to improve their parameter efficiency.
  • CpRec For the output softmax layer, a block embedding matrix with a tree structure is used to represent the blocks of each cluster. As mentioned earlier, for both training and inference stages, CpRec can achieve significant acceleration through this structure. Similar to NextItNet, given each input sequence ⁇ x 1 ,...,x t ⁇ , CpRec estimate represents the probability distribution of the output sequence ⁇ x 2 ,...,x t+1 ⁇ , where x t+1 is the expected lower An expected recommendation.
  • all recommended items can be sorted according to the frequency of the sequence recommendation system, and the frequency can be as small as possible.
  • Large sorting methods for example, can build a multi-layer tree structure in the output layer.
  • the present invention proposes a new model compression framework CpRec based on deep sequence recommendation based on the deep learning model, which is a sequence recommendation model for learning compression
  • the flexible and universal neural network compression framework comprehensively considers model compression from three aspects: input embedding layer, output softmax layer and intermediate layer, which improves the training and inference efficiency of sequence recommendation models, alleviates the phenomenon of model overfitting, and reduces the storage capacity of the model.
  • CpRec can achieve faster training/inference speed, lower memory and generate recommendations that users are interested in with better recommendation accuracy.
  • the present invention may be a system, a method and/or a computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present invention.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or in one or more programming languages.
  • Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connect).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to implement various aspects of the present invention.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation through hardware, implementation through software, and implementation through a combination of software and hardware are all equivalent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明提供一种基于序列推荐系统的通用网络压缩框架和压缩方法。该通用网络压缩框架包括:基于分块自适应分解的输入嵌入层:用于根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块,其中为每个簇的块分配不同维度;分层参数共享的中间层:连接所述输入嵌入层,由多个残差块堆积而成,采用分层参数共享机制进行参数共享;基于分块自适应分解的输出层:与所述输入嵌入层使用相同的分块嵌入的聚簇配置,并采用树型结构表示各簇的块,获得输出序列的概率分布,进而对期望的推荐项进行预测。本发明有效地解决了序列推荐模型参数量庞大的问题,提升了模型的训练和推断效率,缓解了模型过拟合现象。

Description

一种基于序列推荐系统的通用网络压缩框架和压缩方法 技术领域
本发明涉及序列推荐技术领域,更具体地,涉及一种基于序列推荐系统的通用网络压缩框架和压缩方法。
背景技术
序列(又称基于会话)推荐系统已成为推荐领域的研究热点。这是因为现实生活中的用户交互行为通常以时间序列的形式存在。例如,在亚马逊购买电话后,用户很可能在会话中购买手机壳,耳机和屏幕保护膜等。另一个例子来自流行的短视频共享应用程序TikTok,用户可以在一小时内观看数百个视频,这些视频自然形成了一个视频播放序列。在这种情况下,基于递归神经网络(RNN)或卷积神经网络(CNN)(通常使用空洞卷积)的序列推荐模型获得了最好的推荐性能,因为这些深度学习模型在获取用户-推荐项交互序列中的序列依存关系方面更强大。。
一般来说,基于深度神经网络(DNN)的现代序列推荐模型分为三个主要模块:用于表示交互序列的输入嵌入层、用于生成下一项概率分布的输出softmax层以及夹在它们之间的一个或多个隐藏层(循环层或卷积层)。在实际应用中,为了提高模型的容量,通常采用的方法是更大的模型尺寸和更多的模型参数。通过增加序列推荐模型的大小,即使用更大的嵌入维度或更深的网络架构,可以提高其预测精度。虽然大型网络往往会带来明显的精度提高,但也可能成为模型部署和实时预测的主要障碍。特别是对于内存有限的设备,如GPU/TPU或终端用户设备,具有数亿甚至数十亿参数的大型序列模型很容易达到可用硬件的内存限制。另一个缺点是,使用较大的矩阵和较深的网络会影响训练和推理速度。虽然模型并行化可以应用于更大的网络,但通信开销仍然与模型中的参数数量成比例。此外,现有研究表明,在某一点上,进一步增大模型大小可能导致过度拟合问题或 意外的模型表现退化问题。因此,模型压缩对于实现能够实时响应和更好的泛化能力的推荐模型至关重要。
事实上,推荐系统领域中的模型压缩问题比其他领域,如计算机视觉(CV)和自然语言处理(NLP)更具挑战性。例如,在CV中,用于ImageNet的ResNet-101只有4450万个参数。其中最大的NLP模型BERT Large(24层,16个注意头)有大约3.4亿个可训练参数。相比之下,在工业推荐系统中,例如YouTube和Amazon,存在着数亿个推荐项。如果简单地假设推荐项数为1亿,并将嵌入维度设为1024,可以得到关于输入嵌入和输出softmax矩阵的2000亿可训练的参数,分别比ResNet-101和BERT-Large大4000和400多倍。另一方面,在中小型推荐系统中,来自中间层的参数不可忽略,例如未来的车载推荐系统,其中内存消耗可能由中间层和嵌入矩阵支配。在实践中,如果用户行为序列较长,则可能需要堆叠更多的中间层以获得更好的精度
随着人们越来越多地关注模型压缩方法。一些相关的研究成果也不断被发表出来。例如,在一项研究成果中,提出了使用标准的低秩分解方法,将输入嵌入层和输出softmax层分别分解为两个较小的矩阵,在中间层采用了跨层参数共享的方法,通过这些方法对模型进行压缩,提高了参数效率。在另外一项研究成果中,提出了基于知识蒸馏的模型用于推荐领域,将知识从大型的、经过预训练的教师模型转移到通常较小的学生模型中,进而实现模型压缩。由于没有明确考虑到序列推荐领域的特点,上述两种方法均存在很大的局限性,如模型压缩过程中存在明显的损失性能,并且压缩效果不够理想。
在工业推荐系统中,例如YouTube和Amazon,存在着数亿个推荐项,为表示推荐项之间错综的复杂关系,导致输入嵌入层和输出softmax层的参数量巨大。另一方面,如果用户-推荐项交互序列较长,则可能需要堆叠更多的中间层以获得更好的模型表现,导致中间层参数量巨大。
目前,模型压缩技术在序列推荐系统中还没有得到很好的研究,这体现在,现有技术倾向于应用非常小的嵌入维数用于研究目的。此外,到目前为止,还没有文献使用深度学习模型高于20层的推荐任务。
总之,目前序列推荐模型存在三个明显缺陷:
1)、大型的推荐模型通常会带来明显的性能提升,但它也可能成为模型部署和实时预测的主要障碍。特别是对于处理具有大规模推荐项的推荐系统,输入嵌入层和输出softmax层的参数量呈爆炸性增长,在训练过程中批处理大小和嵌入大小的增加也会使得所需存储空间呈倍数增加,在输出层估计下一个的推荐项的概率时,由于推荐项的数量庞大,消耗时间过多导致训练和测试时极其缓慢。虽然模型并行化可以应用于大型深度模型加速模型,但通信开销仍然与模型中的参数数量成比例;
2)、在实践中推荐模型在处理长交互序列时,通常需搭建深度架构(即加深的堆叠层)用以获取长距离序列依赖信息,这也将大幅增加中间层的参数量;
3)、在某些情况下,进一步增大模型大小可能会导致过拟合或模型性能下降问题。
发明内容
本发明的一个目的是解决序列推荐领域中模型压缩的问题,提供一种基于序列推荐系统的通用压缩框架和压缩方法。
根据本发明的第一方面,提供一种基于序列推荐系统的通用网络压缩框架,包括:
基于分块自适应分解的输入嵌入层:根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块,其中为每个簇的块分配不同维度;
分层参数共享的中间层:连接所述输入嵌入层,由多个残差块堆积而成,采用分层参数共享机制进行参数共享;
基于分块自适应分解的输出层:与所述输入嵌入层使用相同的分块嵌入的聚簇配置,并采用树型结构表示各簇的块,获得输出序列的概率分布,进而对期望的推荐项进行预测。
在一个实施例中,所述根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块包括:
根据推荐项的频率对所有推荐项进行排序S={x 1,x 2,…,x K},其中x 1和x K分别是频率最高和最低的推荐项;
将推荐项集合S划分为n个簇,表示为S=S 1∪S 2,…,∪S n-1∪S n,
Figure PCTCN2020075220-appb-000001
Figure PCTCN2020075220-appb-000002
其中
Figure PCTCN2020075220-appb-000003
每个簇中推荐项的数量为k 1,k 2,…,k n,且
Figure PCTCN2020075220-appb-000004
K为所有推荐项的数量;
将输入嵌入矩阵E∈R K×d分割为n个块,表示为
Figure PCTCN2020075220-appb-000005
Figure PCTCN2020075220-appb-000006
其中d是嵌入大小。
在一个实施例中,对于输入嵌入层,还包括:
将块
Figure PCTCN2020075220-appb-000007
分解为两个低阶矩阵
Figure PCTCN2020075220-appb-000008
其中
Figure PCTCN2020075220-appb-000009
d j是第j个块的因式分解维度;
对于一给定推荐项标签IDx,其嵌入向量v x∈R d表示为:
Figure PCTCN2020075220-appb-000010
其中
Figure PCTCN2020075220-appb-000011
表示第j个块的第g行的嵌入向量,
Figure PCTCN2020075220-appb-000012
在一个实施例中,所述分层参数共享机制包括:
跨块参数共享,表示所有较高层都重复使用第一个残差块的参数;
相邻层参数共享,表示每个残差块中的两个单独的层共享相同的参数集;
相邻块参数共享,表示在每两个相邻残差块之间共享参数。
在一个实施例中,所述采用树型结构表示各簇的块包括:
构建两层树结构,每个树节点代表一个簇,第一个簇的嵌入矩阵被保存在树的根节点,其它簇的嵌入矩阵被保存在树的第二层的叶子节点中;
对于第一个簇,其中每一个推荐项被表示为一个不同的类,而对于其它的簇,为每一个推荐项分配两个节点,包括使用其簇的位置作为该推荐项的父类的根节点和代表其簇中特定位置的叶子节点。
在一个实施例中,所述输出层还执行:
将第一个块矩阵扩展为
Figure PCTCN2020075220-appb-000013
其中n-1表示叶子节点所属的父类的数量,第一个簇的标签集合被扩展为S 1={1,2,…,k 1+n-1},其中 k 1+1到k 1+n-1对应于第2个至第n个簇的父类标签;输出层的其它的块矩阵表示为
Figure PCTCN2020075220-appb-000014
其中
Figure PCTCN2020075220-appb-000015
在一个实施例中,配置为包括多个中间层,并且每两个中间层都使用残差连接,作为一个残差块,通过倍增每一层的空洞因子来获得指数增长的感受野。
根据本发明的第二方面,提供一种利用本发明所提供的基于序列推荐系统的通用网络压缩框架的压缩方法,包括:
根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块,其中为每个簇的块分配不同维度;
所述中间层由多个残差块堆积而成,采用分层参数共享机制进行参数共享;
所述输出层使用与所述输入嵌入层相同的分块嵌入的聚簇配置,并采用树型结构表示各簇的块,获得输出序列的概率分布,进而对期望的推荐项进行预测。
相对于现有技术,本发明的优点在于:基于序列推荐领域的特点,利用深度学习模型,提出了一种新的序列推荐的联合压缩框架,从输入嵌入层、输出softmax层和中间层三个方面来综合考虑模型压缩,有效地解决了模型参数量庞大的问题,提升了序列推荐模型的训练和推断效率,缓解了模型过拟合现象。
附图说明
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。
图1是根据本发明一个实施例的分块嵌入分解方法的示意图;
图2是根据本发明一个实施例的跨块、相邻层/块参数共享的示意图;
图3是根据本发明一个实施例的通用网络压缩框架的示意图;
附图中,block-块,layer-层。
具体实施方式
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
简言之,在本发明实施例中,提出两种通用的模型压缩机制来降低序列推荐系统的内存消耗,分别为分块自适应分解方法和分层参数共享方法。具体包括:为减少输入嵌入和输出softmax矩阵中的参数,提出了分块自适应分解方法得到分块嵌入矩阵,以近似原始的嵌入矩阵(输入嵌入矩阵和输出softmax矩阵在下文的描述中也统称为嵌入矩阵);引入跨块参数共享、相邻层参数共享和相邻块参数共享方法以减少中间层的参数。由于这两种模型压缩机制是正交的,在本发明实施例中可以自然地将它们组合在一起,构成联合模型压缩框架,以实现更高的压缩率。
在本文中,将所提出的联合模型压缩框架命名为CpRec或称为基于序列推荐系统的通用网络压缩框架,该通用网络压缩框架结合序列推荐领域的特点,从输入层、输出层和中间深度架构三个方面来对序列推荐模型进行压缩,在保证模型的推荐精度的前提下,提升了模型工作效率,缓解模型过拟合现象并降低模型所需存储。在下文中,将分布介绍分块自适应分解(包含输入嵌入层和输出softmax层)、分层参数共享的中间层,并基于NextItNet架构来描述本发明所提供的CpRec。
一、关于分块自适应分解
在序列推荐系统中,推荐项的频率分布服从长尾分布,只有少数项目由于其频率高而可能包含丰富的信息,而其他项目可能仅包含有限的信息。例如,一些“头”(或流行)推荐项具有大量的用户交互,但是对于“尾巴”推荐项只有少数交互。鉴于此,对所有推荐项分配固定的嵌入维度是次优且不必要的,可能导致性能欠佳。直观而言,频率较高的推荐项可能比稀有推荐项包含更多的信息,因此应在训练过程中为其分配更多的容量。换句话说,更频繁(或受欢迎)的推荐项的嵌入维度应该比不受欢迎的推荐项的嵌入维度大。
例如,首先根据序列推荐系统的频率对所有推荐项进行排序S={x 1,x 2,…,x K},其中x 1和x K分别是频率最高和最低的推荐项。将推荐项集合S划分为n个簇,例如,表示为:S=S 1∪S 2,…,∪S n-1∪S n,
Figure PCTCN2020075220-appb-000016
Figure PCTCN2020075220-appb-000017
其中
Figure PCTCN2020075220-appb-000018
每个簇中推荐项的数量为k 1,k 2,…,k n,且
Figure PCTCN2020075220-appb-000019
K为所有推荐项的数量。通过这种方式,可以将输入嵌入矩阵E∈R K×d(如图1(a)所示)分割为n个块
Figure PCTCN2020075220-appb-000020
(如图2(b)所示),其中d是嵌入大小。采用相似的策略,对输出softmax矩阵P∈R d×K分割为n个块
Figure PCTCN2020075220-appb-000021
接下来,将分别描述输入嵌入矩阵和输出softmax矩阵的分块自适应分解。
1)、基于分块自适应分解的输入嵌入层
在输入层中,将块
Figure PCTCN2020075220-appb-000022
分解为两个低阶矩阵
Figure PCTCN2020075220-appb-000023
其中
Figure PCTCN2020075220-appb-000024
d j是第j个块的因式分解维度(也称为秩)。由于高频率的推荐项应具有更高的表达能力,因此随着簇的索引增加,减小对应的d j。相应地,每个推荐项的嵌入表示与通过原始的look-up操作的嵌入表示是不同的。给定推荐项标签IDx,将嵌入向量v x∈R d表示为:
Figure PCTCN2020075220-appb-000025
其中
Figure PCTCN2020075220-appb-000026
表示第j个块的第g行的嵌入向量,
Figure PCTCN2020075220-appb-000027
通过这种因式分解,能将输入嵌入层的参数量从O(K×d)减少到
Figure PCTCN2020075220-appb-000028
Figure PCTCN2020075220-appb-000029
当d i<<d 1,输入嵌入层的参数量会明显减少。参见图2(c)示意的分解过程。
与现有技术的普通输入嵌入层不同,本发明实施例的基于分块自适应分解的输入嵌入层根据所有推荐项的频率将其分为多个簇,每个簇的嵌入矩阵(称为块)由两个低秩矩阵分解,其中秩值也由簇中的推荐项的频率确定,即将较大的秩值分配给具有更多频繁项的块。通过分块自适应分解,能够为每个簇的块分配不同的维度进而大幅减少输入嵌入层的参数量。
2)、基于分块自适应分解的输出softmax层
在输出softmax层,参考class-based softmax(Le H S,Oparin I,Allauzen A,et al.Structured output layer neural network language model[C]//2011 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2011:5524-5527.),在一个实施例中,通过两层树来构造这些块,其中每个树节点代表一个簇。图2(d)是输出softmax层的分块嵌入的示例。第一个簇的嵌入矩阵(第一个块)被保存在树的根节点,其它块被保存在树的第二层的叶子节点中。对于第一个簇,其中每一个推荐项被表示为一个不同的类;然而对于其它的簇,为每一个推荐项分配两个节点:使用其簇的位置作为该推荐项的父类的根节点和代表其簇中特定位置的叶子节点。通过这种方式,同一簇中的推荐项共享相同的父类。更具体地,在输出softmax层中使用与输入嵌入层类似的分块嵌入的聚簇配置。一个主要的不同是,输出层的第一个块矩阵被扩展为
Figure PCTCN2020075220-appb-000030
其中n-1表示叶子节点所属的父类的数量。第一个簇的标签集合被扩展为S 1={1,2,…,k 1+n-1},其中k 1+1到k 1+n-1对应于第2个至第n个簇的父类标签。在输出softmax层的其它的块矩阵为
Figure PCTCN2020075220-appb-000031
其中
Figure PCTCN2020075220-appb-000032
对比于原始的softmax层,输出softmax层的参数量从O(K×d)减少到
Figure PCTCN2020075220-appb-000033
Figure PCTCN2020075220-appb-000034
通过上述方式,根据所有推荐项的频率将其分为多个簇,每个簇的嵌入矩阵(称为块)由两个低秩矩阵组成,其中秩值也由簇中的推荐项频率确定-将较高的秩值分配给具有更多频繁项的块。通过分块自适应分解方法, 能够为每个簇的块分配不同的维度进而降低模型大小。
以下将详细描述在训练过程中建立目标函数以及在推理过程中生成推荐项。
训练过程中,在给定上下文向量h∈R d(即序列推荐模型的最终隐藏向量)的情况下,预测下一个用户可能感兴趣的推荐项,需要首先根据下一个推荐的标签(例如x)确定搜索空间。如果x属于第一个簇,则仅计算该簇中的logits。如果x属于其他簇,则将在其父类所属簇(即第一个簇)和当前簇中计算logits。
Figure PCTCN2020075220-appb-000035
表示为:
Figure PCTCN2020075220-appb-000036
其中,规定属于叶子节点的每个推荐项x在第一个簇中都具有父类标签c(x)。相应地,训练过程包括两个步骤。第一步,计算第一个聚类的logits,这需要O(k 1+n-1)时间。在第二阶段,如果推荐项标签x属于叶子节点上的簇之一,将计算该簇的logits,这需要O(k j)时间。通过这种方式,对比于使用原始的softmax,本发明使用分块嵌入的训练时间从O(K)减少到O(k 1+n-1)和O(k 1+k j+n-1)之间。
Figure PCTCN2020075220-appb-000037
Figure PCTCN2020075220-appb-000038
通过softmax函数的归一化值,将关于
Figure PCTCN2020075220-appb-000039
和真实标签向量y的损失函数f表示为:
Figure PCTCN2020075220-appb-000040
与训练阶段不同,在推断过程中,推荐项属于哪个簇是未知的。但是,能够根据条件分布计算所有簇中推荐项的概率分布,表示为:
Figure PCTCN2020075220-appb-000041
其中p(x|h,S 1),p(x|c(x),h),p(c(x)|h,S 1)均可由公式(2)计算。最终,得到基于p(x)推荐top-N(前N项)的推荐项。实际上,在推断阶段通常不需要为所有推荐项计算softmax概率。可以执行提前停止搜索以加快生成过程。具体来说,如果前N个概率得分位于第一个簇中,不需要计算在其它簇的推荐项的概率分数(即p(x|c(x),h))。因为p(x|c(x),h)p(c(x)|h,S 1)(其中p(c(x)|h,S 1)<1)始终小于第一个簇的top-N得分。
对于本发明的分块自适应分解的输出softmax层,在输出softmax层使用与输入嵌入层类似的分块嵌入的聚类结构,包括块数和每一个块的大小。并且在输出softmax层中设计了一种基于树表示的概率近似方法取代原始的softmax,从而显著地减少了输出softmax层的参数量,训练时间和推断时间。
综上,本发明提出了一种分块自适应分解方法用于输入嵌入和输出softmax层,以得到分块嵌入矩阵,进而近似序列推荐系统中原始的输入和输出层的嵌入矩阵,并在输出softmax层中基于树表示设计了一种概率近似方法替代原始的softmax。这种方式能够很好的降低输入/输出层的嵌入矩阵参数量,有效地提升序列推荐模型的训练和推断速度,从而解决了现有推荐系统中推荐项过多,使得输入嵌入层和输出softmax层的参数量巨大以及模型训练和推断速度缓慢的缺点。
二、分层参数共享的中间层
在许多实际的推荐系统中,用户交互序列可能非常长,例如短视频和新闻推荐。为了对长距离交互序列进行建模,一种常见的方法是构建更深的网络体系结构。但中间层的参数大小可能会占总体内存消耗的主导,尤其是对于移动或最终用户设备中的小规模应用而言。因此,本发明实施例提出的压缩方法主要集中在序列推荐模型上,该模型具有多个中间层,并且每两个中间层都通过残差连接(参见ResNet结构(He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:770-778.)),例如NextItNet。
为了降低中间层的参数消耗,ALBERT(Lan Z,Chen M,Goodman S,et al.Albert:A lite bert for self-supervised learning of language representations[J].arXiv preprint arXiv:1909.11942,2019.)提出了跨层参数共享方法,如图2(a)所示。在减少大量冗余参数的同时,神经网络模型的表达也受到了一定程度的限制。实际上,使用该跨层共享方案,在推荐任务中,模型性能显著降低了。因此,本发明实施例提出了跨块参数共享,如图2(b)所示,其中所有较高层都重复使用第一个残差块(即两 个底层)的参数。
为了充分利用深度模型的堆叠层的优势,同时提高参数效率,本发明实施例还提出了另外两种分层的参数共享方法:相邻层参数共享(如图2c所示)和相邻块参数共享(如图2(d)所示)。具体地,相邻层参数共享表示每个残差块中的两个单独的层共享相同的参数集。相邻块参数共享表示在每两个相邻残差块之间共享参数。这种参数共享策略具有两个主要优点:作为一种正则化方法,可以稳定训练过程并提高模型的泛化能力;可以显著减少参数量,而不会像跨层参数共享那样降低性能。特别是,通过相邻块参数共享,推荐准确率始终比基准模型更好。
综上,本发明提出了三种不同的分层参数共享方式(即跨块参数共享、相邻层参数共享、相邻块参数共享),以减少中间层的冗余参数。随着模型层数的不断加深,有效地限制了参数量,从而解决了序列推荐领域中提取用户-推荐项交互序列中的长距离依赖信息所需搭建深度架构而造成的参数量庞大的缺点。
三、基于系统推荐系统的通用网络压缩框架的架构
本发明通过使用NextItNet架构实例化CpRec,参见图3所示总体的神经网络模型架构,其中分为三大模块,分别是分块自适应分解的输入嵌入层(对应图3中左方结构图)、分块自适应分解的输出softmax层(对应图3中右方结构图)和分层参数共享的中间层(对应图3的中间结构图)。
对于输入嵌入层,给定用户-推荐项交互序列{x 1,x 2,…,x t+1},序列推荐模型将通过基于分块嵌入矩阵的查找表检索前t个推荐项{x 1,x 2,…,x t}的嵌入向量。然后可以将这些嵌入向量堆叠到一个新的矩阵中(如图3的左侧,其中t=5),该矩阵用作中间层的输入。
对于中间层,如图3的中间部分所示,参照NextItNet为CpRec使用空洞卷积层,其中每两层都使用残差连接,作为一个残差块。CpRec通过倍增每一层的空洞因子来获得指数增长的感受野,例如{1,2,4,8}。此外,为了进一步增强模型的表达能力和提高准确性,可堆叠该结构多次,例如{1,2,4,8,…,1,2,4,8}。然后可以在这些中间层应用所提出的分层参数共享策略,以提高它们的参数效率。
对于输出softmax层,采用树型结构的分块嵌入矩阵来表示每个簇的块。如前所述,对于训练和推理阶段,CpRec都可以通过这种结构实现显著的加速。类似于NextItNet,给定每个输入序列{x 1,…,x t},CpRec估计表示{x 2,…,x t+1}的输出序列的概率分布,其中x t+1是预期的下一个被期望的推荐项。
需要说明的是,在不违背本发明精神的前提下,本领域技术人员对上述实施例可以作适当的变型,例如,根据序列推荐系统的频率对所有推荐项进行排序也可采用频率由小到大的排序方式,又如,在输出层可构建多层树结构等。
综上所述,本发明基于序列推荐领域的特点,在深度学习模型的基础上,提出了一种新的基于深度序列推荐的模型压缩框架CpRec,其是一种用于学习压缩的序列推荐模型的灵活通用的神经网络压缩框架。该框架从输入嵌入层、输出softmax层和中间层三个方面来综合考虑模型压缩,提升了序列推荐模型的训练和推断效率,缓解了模型过拟合现象,降低了模型的存储占有量。CpRec能够实现更快的训练/推断速度,更低的内存并以更好的推荐准确率生成用户感兴趣的推荐项。
为了验证本发明所提出基于序列推荐系统的通用网络压缩框架的有效性和先进性,使用联合压缩框架CpRec在tiktok、movielen这两个广泛使用的推荐数据集上进行了大量实验和消融分析,从而进行推荐性能、参数量和训练\推断时间测试对比。实验结果表明,本发明始终优于现有的NextItNet和RNN等基准模型,可以被广泛应用于序列推荐领域。
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储 器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品 的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽 性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。

Claims (10)

  1. 一种基于序列推荐系统的通用网络压缩框架,包括:
    基于分块自适应分解的输入嵌入层:根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块,其中为每个簇的块分配不同维度;
    分层参数共享的中间层:连接所述输入嵌入层,由多个残差块堆积而成,采用分层参数共享机制进行参数共享;
    基于分块自适应分解的输出层:与所述输入嵌入层使用相同的分块嵌入的聚簇配置,并采用树型结构表示各簇的块,获得输出序列的概率分布,进而对期望的推荐项进行预测。
  2. 根据权利要求1所述的基于序列推荐系统的通用网络压缩框架,其中,所述根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块包括:
    根据推荐项的频率对所有推荐项进行排序S={x 1,x 2,…,x K},其中x 1和x K分别是频率最高和最低的推荐项;
    将推荐项集合S划分为n个簇,表示为S=S ∪S ,…,∪S n-1∪S n,
    Figure PCTCN2020075220-appb-100001
    Figure PCTCN2020075220-appb-100002
    其中
    Figure PCTCN2020075220-appb-100003
    α≠β,每个簇中推荐项的数量为k 1,k 2,…,k n,且
    Figure PCTCN2020075220-appb-100004
    K为所有推荐项的数量;
    将输入嵌入矩阵E∈R K×d分割为n个块,表示为
    Figure PCTCN2020075220-appb-100005
    Figure PCTCN2020075220-appb-100006
    其中d是嵌入大小。
  3. 根据权利要求2所述的基于序列推荐系统的通用网络压缩框架,其中,对于输入嵌入层,还包括:
    将块
    Figure PCTCN2020075220-appb-100007
    分解为两个低阶矩阵
    Figure PCTCN2020075220-appb-100008
    其中
    Figure PCTCN2020075220-appb-100009
    d j是第j个块的因式分解维度;
    对于一给定推荐项标签IDx,其嵌入向量v x∈R d表示为:
    Figure PCTCN2020075220-appb-100010
    其中
    Figure PCTCN2020075220-appb-100011
    表示第j个块的第g行的嵌入向量,
    Figure PCTCN2020075220-appb-100012
  4. 根据权利要求1所述的基于序列推荐系统的通用网络压缩框架,其中,所述分层参数共享机制包括:
    跨块参数共享,表示所有较高层都重复使用第一个残差块的参数;
    相邻层参数共享,表示每个残差块中的两个单独的层共享相同的参数集;
    相邻块参数共享,表示在每两个相邻残差块之间共享参数。
  5. 根据权利要求3所述的基于序列推荐系统的通用网络压缩框架,其中,所述采用树型结构表示各簇的块包括:
    构建两层树结构,每个树节点代表一个簇,第一个簇的嵌入矩阵被保存在树的根节点,其它簇的嵌入矩阵被保存在树的第二层的叶子节点中;
    对于第一个簇,其中每一个推荐项被表示为一个不同的类,而对于其它的簇,为每一个推荐项分配两个节点,包括使用其簇的位置作为该推荐项的父类的根节点和代表其簇中特定位置的叶子节点。
  6. 根据权利要求5所述的基于序列推荐系统的通用网络压缩框架,其中,所述输出层还执行:
    将第一个块矩阵扩展为
    Figure PCTCN2020075220-appb-100013
    其中n-1表示叶子节点所属的父类的数量,第一个簇的标签集合被扩展为S 1={1,2,…,k 1+n-1},其中k 1+1到k 1+n-1对应于第2个至第n个簇的父类标签;输出层的其它的块矩阵表示为
    Figure PCTCN2020075220-appb-100014
    其中
    Figure PCTCN2020075220-appb-100015
  7. 根据权利要求1所述的基于序列推荐系统的通用网络压缩框架,其中,配置为包括多个中间层,并且每两个中间层都使用残差连接,作为一个残差块,通过倍增每一层的空洞因子来获得指数增长的感受野。
  8. 一种利用权利要求1至7任一项所述的基于序列推荐系统的通用网络压缩框架的压缩方法,包括:
    根据推荐项的频率将推荐项集合划分为多个簇并将输入嵌入矩阵分割为对应的多个块,其中为每个簇的块分配不同维度;
    所述中间层由多个残差块堆积而成,采用分层参数共享机制进行参数共享;
    所述输出层使用与所述输入嵌入层相同的分块嵌入的聚簇配置,并采 用树型结构表示各簇的块,获得输出序列的概率分布,进而对期望的推荐项进行预测。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求8所述的压缩方法的步骤。
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求8所述的压缩方法的步骤。
PCT/CN2020/075220 2020-02-14 2020-02-14 一种基于序列推荐系统的通用网络压缩框架和压缩方法 WO2021159448A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/075220 WO2021159448A1 (zh) 2020-02-14 2020-02-14 一种基于序列推荐系统的通用网络压缩框架和压缩方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/075220 WO2021159448A1 (zh) 2020-02-14 2020-02-14 一种基于序列推荐系统的通用网络压缩框架和压缩方法

Publications (1)

Publication Number Publication Date
WO2021159448A1 true WO2021159448A1 (zh) 2021-08-19

Family

ID=77292015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075220 WO2021159448A1 (zh) 2020-02-14 2020-02-14 一种基于序列推荐系统的通用网络压缩框架和压缩方法

Country Status (1)

Country Link
WO (1) WO2021159448A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879945A (zh) * 2022-04-27 2022-08-09 武汉大学 面向长尾分布特征的多样化api序列推荐方法及装置
CN115277452A (zh) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 基于边端协同的ResNet自适应加速计算方法及应用
CN116362848A (zh) * 2023-06-03 2023-06-30 成都豪杰特科技有限公司 基于人工智能的电子商务的推荐方法、系统、设备和介质
CN114879945B (zh) * 2022-04-27 2024-07-16 武汉大学 面向长尾分布特征的多样化api序列推荐方法及装置

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129971A1 (en) * 2016-11-10 2018-05-10 Adobe Systems Incorporated Learning user preferences using sequential user behavior data to predict user behavior and provide recommendations
CN110765353A (zh) * 2019-10-16 2020-02-07 腾讯科技(深圳)有限公司 项目推荐模型的处理方法、装置、计算机设备和存储介质

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129971A1 (en) * 2016-11-10 2018-05-10 Adobe Systems Incorporated Learning user preferences using sequential user behavior data to predict user behavior and provide recommendations
CN110765353A (zh) * 2019-10-16 2020-02-07 腾讯科技(深圳)有限公司 项目推荐模型的处理方法、装置、计算机设备和存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FAJIE YUAN ; ALEXANDROS KARATZOGLOU ; IOANNIS ARAPAKIS ; JOEMON M. JOSE ; XIANGNAN HE: "A Simple Convolutional Generative Network for Next Item Recommendation", WEB SEARCH AND DATA MINING, ACM, 2 PENN PLAZA, SUITE 701NEW YORKNY10121-0701USA, 30 January 2019 (2019-01-30) - 15 February 2019 (2019-02-15), 2 Penn Plaza, Suite 701New YorkNY10121-0701USA, pages 582 - 590, XP058424888, ISBN: 978-1-4503-5940-5, DOI: 10.1145/3289600.3290975 *
ZHENZHONG LAN; MINGDA CHEN; SEBASTIAN GOODMAN; KEVIN GIMPEL; PIYUSH SHARMA; RADU SORICUT: "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 September 2019 (2019-09-26), 201 Olin Library Cornell University Ithaca, NY 14853, XP081483506 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879945A (zh) * 2022-04-27 2022-08-09 武汉大学 面向长尾分布特征的多样化api序列推荐方法及装置
CN114879945B (zh) * 2022-04-27 2024-07-16 武汉大学 面向长尾分布特征的多样化api序列推荐方法及装置
CN115277452A (zh) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 基于边端协同的ResNet自适应加速计算方法及应用
CN115277452B (zh) * 2022-07-01 2023-11-28 中铁第四勘察设计院集团有限公司 基于边端协同的ResNet自适应加速计算方法及应用
CN116362848A (zh) * 2023-06-03 2023-06-30 成都豪杰特科技有限公司 基于人工智能的电子商务的推荐方法、系统、设备和介质
CN116362848B (zh) * 2023-06-03 2023-10-27 广州爱特安为科技股份有限公司 基于人工智能的电子商务的推荐方法、系统、设备和介质

Similar Documents

Publication Publication Date Title
CN112633010B (zh) 基于多头注意力和图卷积网络的方面级情感分析方法及系统
CN112115352B (zh) 基于用户兴趣的会话推荐方法及系统
WO2016037350A1 (en) Learning student dnn via output distribution
Ghorbani et al. Data shapley valuation for efficient batch active learning
Liu et al. A survey on computationally efficient neural architecture search
CN111368995B (zh) 一种基于序列推荐系统的通用网络压缩框架和压缩方法
CN112287170B (zh) 一种基于多模态联合学习的短视频分类方法及装置
WO2021159448A1 (zh) 一种基于序列推荐系统的通用网络压缩框架和压缩方法
WO2022252455A1 (en) Methods and systems for training graph neural network using supervised contrastive learning
CN111460132A (zh) 一种基于图卷积神经网络的生成式会议摘要方法
WO2023077819A1 (zh) 数据处理系统及方法、装置、设备、存储介质、计算机程序、计算机程序产品
CN114926770B (zh) 视频动作识别方法、装置、设备和计算机可读存储介质
CN111522925A (zh) 对话状态生成方法和装置
WO2023084348A1 (en) Emotion recognition in multimedia videos using multi-modal fusion-based deep neural network
CN115953645A (zh) 模型训练方法、装置、电子设备及存储介质
CN110297894B (zh) 一种基于辅助网络的智能对话生成方法
CN115588122A (zh) 一种基于多模态特征融合的新闻分类方法
CN113343100B (zh) 一种基于知识图谱的智慧城市资源推荐方法和系统
Lyu et al. A survey of model compression strategies for object detection
Gao et al. Generalized pyramid co-attention with learnable aggregation net for video question answering
CN113360683A (zh) 训练跨模态检索模型的方法以及跨模态检索方法和装置
WO2023174189A1 (zh) 图网络模型节点分类方法、装置、设备及存储介质
CN117390267A (zh) 基于知识图谱的个性化多任务增强推荐模型
Mishra et al. Deep machine learning and neural networks: an overview
CN116663523A (zh) 多角度增强网络的语义文本相似度计算方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919221

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 110123)

122 Ep: pct application non-entry in european phase

Ref document number: 20919221

Country of ref document: EP

Kind code of ref document: A1