查看原文
其他

论文周报 | 推荐系统领域最新研究进展

ML_RSer 机器学习与推荐算法 2022-12-14

嘿,记得给“机器学习与推荐算法”添加星标


本文精选了上周(0912-0918)最新发布的10篇推荐系统相关论文。

本次论文集合的方向主要包括序列推荐[1]、点击率预估模型中的过拟合现象理解[3]、针对于大规模CTR的增量学习[4]、学习最优嵌入方法[5]、针对长序列数据的稀疏注意力网络[6]、针对全空间多任务学习的多尺度用户行为网络[7]、因果推荐系统[8]、针对CTR任务的层次意图嵌入网络[9]、基于图学习的推荐系统研究综述[10]等。

以下整理了论文标题以及摘要,如感兴趣可移步原文精读。

  • 1. Beyond Learning from Next Item: Sequential Recommendation via  Personalized Interest Sustainability, CIKM2022
  • 2. Simple and Powerful Architecture for Inductive Recommendation Using  Knowledge Graph Convolutions
  • 3. Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models, CIKM2022
  • 4. An Incremental Learning framework for Large-scale CTR Prediction, RecSys2022
  • 5. OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction, CIKM2022
  • 6. Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences, CIKM2022
  • 7. Multi-Scale User Behavior Network for Entire Space Multi-Task Learning, CIKM2022
  • 8. Addressing Confounding Feature Issue for Causal Recommendation, TOIS2022
  • 9. HIEN: Hierarchical Intention Embedding Network for Click-Through Rate Prediction, SIGIR2022
  • 10. 基于图学习的推荐系统研究综述, 计算机科学2022

1. Beyond Learning from Next Item: Sequential Recommendation via  Personalized Interest Sustainability, CIKM2022

Dongmin Hyun, Chanyoung Park, Junsu Cho, Hwanjo Yu

https://arxiv.org/abs/2209.06644

Sequential recommender systems have shown effective suggestions by capturing users' interest drift. There have been two groups of existing sequential models: user- and item-centric models. The user-centric models capture personalized interest drift based on each user's sequential consumption history, but do not explicitly consider whether users' interest in items sustains beyond the training time, i.e., interest sustainability. On the other hand, the item-centric models consider whether users' general interest sustains after the training time, but it is not personalized. In this work, we propose a recommender system taking advantages of the models in both categories. Our proposed model captures personalized interest sustainability, indicating whether each user's interest in items will sustain beyond the training time or not. We first formulate a task that requires to predict which items each user will consume in the recent period of the training time based on users' consumption history. We then propose simple yet effective schemes to augment users' sparse consumption history. Extensive experiments show that the proposed model outperforms 10 baseline models on 11 real-world datasets. The codes are available at: https://github.com/dmhyun/PERIS.

2. Simple and Powerful Architecture for Inductive Recommendation Using  Knowledge Graph Convolutions

Theis E. Jendal, Matteo Lissandrini, Peter Dolog, Katja Hose

https://arxiv.org/abs/2209.04185

Using graph models with relational information in recommender systems has shown promising results. Yet, most methods are transductive, i.e., they are based on dimensionality reduction architectures. Hence, they require heavy retraining every time new items or users are added. Conversely, inductive methods promise to solve these issues. Nonetheless, all inductive methods rely only on interactions, making recommendations for users with few interactions sub-optimal and even impossible for new items. Therefore, we focus on inductive methods able to also exploit knowledge graphs (KGs). In this work, we propose SimpleRec, a strong baseline that uses a graph neural network and a KG to provide better recommendations than related inductive methods for new users and items. We show that it is unnecessary to create complex model architectures for user representations, but it is enough to allow users to be represented by the few ratings they provide and the indirect connections among them without any user metadata. As a result, we re-evaluate state-of-the-art methods, identify better evaluation protocols, highlight unwarranted conclusions from previous proposals, and showcase a novel, stronger baseline for this task.

3. Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models, CIKM2022

Zhao-Yu Zhang, Xiang-Rong Sheng, Yujing Zhang, Biye Jiang, Shuguang Han, Hongbo Deng, Bo Zheng

https://arxiv.org/abs/2209.06053

Deep learning techniques have been applied widely in industrial recommendation systems. However, far less attention has been paid to the overfitting problem of models in recommendation systems, which, on the contrary, is recognized as a critical issue for deep neural networks. In the context of Click-Through Rate (CTR) prediction, we observe an interesting one-epoch overfitting problem: the model performance exhibits a dramatic degradation at the beginning of the second epoch. Such a phenomenon has been witnessed widely in real-world applications of CTR models. Thereby, the best performance is usually achieved by training with only one epoch. To understand the underlying factors behind the one-epoch phenomenon, we conduct extensive experiments on the production data set collected from the display advertising system of Alibaba. The results show that the model structure, the optimization algorithm with a fast convergence rate, and the feature sparsity are closely related to the one-epoch phenomenon. We also provide a likely hypothesis for explaining such a phenomenon and conduct a set of proof-of-concept experiments. We hope this work can shed light on future research on training more epochs for better performance.

4. An Incremental Learning framework for Large-scale CTR Prediction, RecSys2022

Petros Katsileros, Nikiforos Mandilaras

https://arxiv.org/abs/2209.00458

In this work we introduce an incremental learning framework for Click-Through-Rate (CTR) prediction and demonstrate its effectiveness for Taboola's massive-scale recommendation service. Our approach enables rapid capture of emerging trends through warm-starting from previously deployed models and fine tuning on "fresh" data only. Past knowledge is maintained via a teacher-student paradigm, where the teacher acts as a distillation technique, mitigating the catastrophic forgetting phenomenon. Our incremental learning framework enables significantly faster training and deployment cycles (x12 speedup). We demonstrate a consistent Revenue Per Mille (RPM) lift over multiple traffic segments and a significant CTR increase on newly introduced items.

5. OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction, CIKM2022

Fuyuan Lyu, Xing Tang, Hong Zhu, Huifeng Guo, Yingxue Zhang, Ruiming Tang, Xue Liu

https://arxiv.org/abs/2208.04482

Learning embedding table plays a fundamental role in Click-through rate(CTR) prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions for feature fields and reduce the number of embeddings respectively or mask the embedding table parameters. However, all these existing works cannot get an optimal embedding table. On the one hand, various embedding dimensions still require a large amount of memory due to the vast number of features in the dataset. On the other hand, decreasing the number of embeddings usually suffers from performance degradation, which is intolerable in CTR prediction. Finally, pruning embedding parameters will lead to a sparse embedding table, which is hard to be deployed. To this end, we propose an optimal embedding table learning framework OptEmbed, which provides a practical and general method to find an optimal embedding table for various base CTR models. Specifically, we propose pruning the redundant embeddings regarding corresponding features' importance by learnable pruning thresholds. Furthermore, we consider assigning various embedding dimensions as one single candidate architecture. To efficiently search the optimal embedding dimensions, we design a uniform embedding dimension sampling scheme to equally train all candidate architectures, meaning architecture-related parameters and learnable thresholds are trained simultaneously in one supernet. We then propose an evolution search method based on the supernet to find the optimal embedding dimensions for each field. Experiments on public datasets show that OptEmbed can learn a compact embedding table which can further improve the model performance.

6. Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences, CIKM2022

Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang

https://arxiv.org/abs/2208.04022

Sequential recommendation predicts users' next behaviors with their historical interactions. Recommending with longer sequences improves recommendation accuracy and increases the degree of personalization. As sequences get longer, existing works have not yet addressed the following two main challenges. Firstly, modeling long-range intra-sequence dependency is difficult with increasing sequence lengths. Secondly, it requires efficient memory and computational speeds. In this paper, we propose a Sparse Attentive Memory (SAM) network for long sequential user behavior modeling. SAM supports efficient training and real-time inference for user behavior sequences with lengths on the scale of thousands. In SAM, we model the target item as the query and the long sequence as the knowledge database, where the former continuously elicits relevant information from the latter. SAM simultaneously models target-sequence dependencies and long-range intra-sequence dependencies with O(L) complexity and O(1) number of sequential updates, which can only be achieved by the self-attention mechanism with O(L^2) complexity. Extensive empirical results demonstrate that our proposed solution is effective not only in long user behavior modeling but also on short sequences modeling. Implemented on sequences of length 1000, SAM is successfully deployed on one of the largest international E-commerce platforms. This inference time is within 30ms, with a substantial 7.30% click-through rate improvement for the online A/B test. To the best of our knowledge, it is the first end-to-end long user sequence modeling framework that models intra-sequence and target-sequence dependencies with the aforementioned degree of efficiency and successfully deployed on a large-scale real-time industrial recommender system.

7. Multi-Scale User Behavior Network for Entire Space Multi-Task Learning, CIKM2022

Jiarui Jin, Xianyu Chen, Weinan Zhang, Yuanbo Chen, Zaifan Jiang, Zekun Zhu, Zhewen Su, Yong Yu

https://arxiv.org/abs/2208.01889

Modelling the user's multiple behaviors is an essential part of modern e-commerce, whose widely adopted application is to jointly optimize click-through rate (CTR) and conversion rate (CVR) predictions. Most of existing methods overlook the effect of two key characteristics of the user's behaviors: for each item list, (i) contextual dependence refers to that the user's behaviors on any item are not purely determinated by the item itself but also are influenced by the user's previous behaviors (e.g., clicks, purchases) on other items in the same sequence; (ii) multiple time scales means that users are likely to click frequently but purchase periodically. To this end, we develop a new multi-scale user behavior network named Hierarchical rEcurrent Ranking On the Entire Space (HEROES) which incorporates the contextual information to estimate the user multiple behaviors in a multi-scale fashion. Concretely, we introduce a hierarchical framework, where the lower layer models the user's engagement behaviors while the upper layer estimates the user's satisfaction behaviors. The proposed architecture can automatically learn a suitable time scale for each layer to capture the dynamic user's behavioral patterns. Besides the architecture, we also introduce the Hawkes process to form a novel recurrent unit which can not only encode the items' features in the context but also formulate the excitation or discouragement from the user's previous behaviors. We further show that HEROES can be extended to build unbiased ranking systems through combinations with the survival analysis technique. Extensive experiments over three large-scale industrial datasets demonstrate the superiority of our model compared with the state-of-the-art methods.

8. Addressing Confounding Feature Issue for Causal Recommendation, TOIS2022

Xiangnan He,Yang Zhang,Fuli Feng,Chonggang Song,Lingling Yi,Guohui Ling,Yongdong Zhang

https://dl.acm.org/doi/10.1145/3559757

In recommender system, some feature directly affects whether an interaction would happen, making the happened interactions not necessarily indicate user preference. For instance, short videos are objectively easier to be finished even though the user does not like the video. We term such feature as confounding feature, and video length is a confounding feature in video recommendation. If we fit a model on such interaction data, just as done by most data-driven recommender systems, the model will be biased to recommend short videos more, and deviate from user actual requirement. This work formulates and addresses the problem from the causal perspective. Assuming there are some factors affecting both the confounding feature and other item features, e.g., the video creator, we find the confounding feature opens a backdoor path behind user-item matching and introduces spurious correlation. To remove the effect of backdoor path, we propose a framework named Deconfounding Causal Recommendation (DCR), which performs intervened inference with do-calculus. Nevertheless, evaluating do-calculus requires to sum over the prediction on all possible values of confounding feature, significantly increasing the time cost. To address the efficiency challenge, we further propose a mixture-of-experts (MoE) model architecture, modeling each value of confounding feature with a separate expert module. Through this way, we retain the model expressiveness with few additional costs. We demonstrate DCR on the backbone model of neural factorization machine (NFM), showing that DCR leads to more accurate prediction of user preference with small inference time cost. We release our code at: https://github.com/zyang1580/DCR.

9. HIEN: Hierarchical Intention Embedding Network for Click-Through Rate Prediction, SIGIR2022

Zuowu Zheng, Changwang Zhang, Xiaofeng Gao, Guihai Chen

https://arxiv.org/abs/2206.00510

Click-through rate (CTR) prediction plays an important role in online advertising and recommendation systems, which aims at estimating the probability of a user clicking on a specific item. Feature interaction modeling and user interest modeling methods are two popular domains in CTR prediction, and they have been studied extensively in recent years. However, these methods still suffer from two limitations. First, traditional methods regard item attributes as ID features, while neglecting structure information and relation dependencies among attributes. Second, when mining user interests from user-item interactions, current models ignore user intents and item intents for different attributes, which lacks interpretability. Based on this observation, in this paper, we propose a novel approach Hierarchical Intention Embedding Network (HIEN), which considers dependencies of attributes based on bottom-up tree aggregation in the constructed attribute graph. HIEN also captures user intents for different item attributes as well as item intents based on our proposed hierarchical attention mechanism. Extensive experiments on both public and production datasets show that the proposed model significantly outperforms the state-of-the-art methods. In addition, HIEN can be applied as an input module to state-of-the-art CTR prediction methods, bringing further performance lift for these existing models that might already be intensively used in real systems.

10. 基于图学习的推荐系统研究综述, 计算机科学2022

程章桃, 钟婷, 张晟铭, 周帆

https://www.jsjkx.com/CN/10.11896/jsjkx.210900072

协同过滤是一种被广泛应用于推荐系统中的方法,其利用不同用户之间(或不同物品之间)的相似性关系来过滤和抽取用户和物品的交互信息,从而进行用户推荐。近年来,图神经网络因其出色的表示学习性能和良好的可扩展性逐渐成为推荐领域中的一种新兴的范式。文中从图学习角度对近年来推荐领域的研究进行系统性的回顾与总结。首先,根据数据类型将推荐场景分成两类,包括基于交互信息的推荐系统(将用户与物品交互数据作为关键数据源)和辅助信息增强的推荐系统(融入与用户和物品相关联的社交信息和知识图谱信息);其次,从随机游走、图表示学习和图神经网络方面入手,对不同推荐场景中的方法、关键技术、主要难点和重要进展进行回顾与总结;最后,总结关于图学习方法在推荐领域中面临的挑战和未来的主要研究方向。


欢迎干货投稿 \ 论文宣传 \ 合作交流

推荐阅读

推荐系统模型发展简史

深度总结 | 推荐算法中的特征工程

最新综述 | 基于因果推断的推荐系统

由于公众号试行乱序推送,您可能不再准时收到机器学习与推荐算法的推送。为了第一时间收到本号的干货内容, 请将本号设为星标,以及常点文末右下角的“在看”。

喜欢的话点个在看

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存