Dynamic Gaussian Embedding of Authors

Abstract

Authors publish documents in a dynamic manner. Their topic of interest and writing style might shift over time. Tasks such as author classification, author identification or link prediction are difficult to solve in such complex data settings. We propose a new representation learning model, DGEA (for Dynamic Gaussian Embedding of Authors), that is more suited to solve these tasks by capturing this temporal evolution. We formulate a general embedding framework: author representation at time t is a Gaussian distribution that leverages pre-trained document vectors, and that depends on the publications observed until t. The representations should retain some form of multi-topic information and temporal smoothness. We propose two models that fit into this framework. The first one, K-DGEA, uses a first order Markov model optimized with an Expectation Maximization Algorithm with Kalman Equations. The second, R-DGEA, makes use of a Recurrent Neural Network to model the time dependence. We evaluate our method on several quantitative tasks: author identification, classification, and co-authorship prediction, on two datasets written in English. In addition, our model is language agnostic since it only requires pre-trained document embeddings. It outperforms existing baselines by up to 18% on an author classification task on a news articles dataset.

Publication
Proceedings of The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, WWW 22, 2109–2119