Encoder-decoder

1 Introduction

In graph-based deep learning, low-dimensional vector representations of nodes in a graph are known as node embeddings. The goal of node embeddings is to project graph nodes from their original feature space into a lower-dimensional space to reduce computational complexity while retaining the structural and relational information of nodes in the graph. A key idea in graph-based deep learning, or more precisely in graph representation learning, is to learn these node embeddings.

1.1 Encoder-decoder framework

In graph representation learning, we leverage the encoder-decoder framework to facilitate a comprehensive understanding of the graph structure. It is achieved in two crucial steps. Firstly, an encoder model is employed to map each node in the graph to a compact, low-dimensional vector, known as embedding. These embeddings are then passed as input to a decoder model that aims to reconstruct the local neighborhood information for each node in the original graph. By doing so, we obtain a rich and structured graph representation conducive to further analysis. The encoder-decoder framework is depicted in Figure 1.

Encoder-Decoder framework

1.2 Optimization: Reconstructing the graph

To achieve the objective stated in equation 4, the conventional method involves minimizing an empirical reconstruction loss denoted as \(L\). This minimization process occurs over a defined set of training node pairs, denoted as \(D\).:
\(L = \sum_{(u,v) \in D}^{} l(DEC(z_u, z_v), S[u,v])\)
where \(l: \mathbb{R} \times \mathbb{R} \to \mathbb{R}\) quantifies the difference between the estimated similarity value \(DEC(z_u, z_v)\) and true similarity values \(S[u,v]\). The loss function \(l\) might be a mean-squared error or cross-entropy, depending on the decoder and similarity function \(S\).The ultimate aim is to refine the decoder to accurately reconstruct pairwise node relationships in the training set D, typically using stochastic gradient descent or other suitable optimization techniques.

A table of shallow encoding-based Encoder-Decoder approaches is provided below in Table \ref{table 1}.

Method Decoder Loss function
Laplacian Eigenmaps \(||z_u - z_v||_2^2\) \(DEC(z_u, z_v) . S[u,v]\)
Graph Factorization \(z_u^T z_v\) \(||DEC(z_u, z_v) - S[u,v]||_2^2\)
GraRep \(z_u^T z_v\) \(||DEC(z_u, z_v) - S[u,v]||_2^2\)
DeepWalk \(e^{z_u^T.z_v} / \sum_{k\in V} e^{z_u^T.z_k}\) \(-S[u,v]. \log(DEC(z_u, z_v))\)
node2vec \(e^{z_u^T.z_v} / \sum_{k\in V} e^{z_u^T.z_k}\) \(-S[u,v]. \log(DEC(z_u, z_v))\)

1.3 Matrix-Factorization approaches to node embeddings

The encoder-decoder concept can be seen through the lens of matrix factorization. The aim is to utilize matrix factorization techniques to obtain a low-dimensional representation capable of capturing a user-defined notion of similarity between nodes. This objective is closely connected to the task of reconstructing elements within a graph’s adjacency matrix. In this study, we will delve into two significant approaches inspired by the matrix factorization technique.

1.4 Random walk-based node embeddings

Taking motivation from inner-product approaches, random walk approaches try to adapt the traditional inner-product approach to incorporate stochastic or probabilistic measures of neighborhood overlap. This means that node similarity or neighborhood overlap is not just binary (similar or dissimilar) but probabilistic. The aim is to optimize node embeddings such that nodes that appear frequently together in short graph random walks will have similar embeddings.

Random-walk generation: The first step is to perform a random traversal of graph and generate random walks. The steps can be described as below:

  1. Let \(G(V, E)\) be a connected graph, and the start node for a random walk is node \(v_0 \in V\).
  2. Assume that at the \(t\) -th step of random walk, we are at node \(v_t\).
  3. Then, the next node in random walk will be chosen based on the probability:
    \(p(v_{t+1}|v_t) =\) \frac{1}{d(v_t)} , \quad if \quad v_{t+1} \in N(v_t)
    0 , \qquad \qquad otherwise \(\)
    where \(d(v_t)\) is degree of node \(v_t\) and \(N(v_t)\) denotes the set of neighbour nodes of \(v_t\). In other words, during a random walk, the upcoming nodes are chosen randomly from the current node’s neighbours, and each neighbour has an equal probability of being selected.
  4. To generate random walks that can capture the information about the entire graph, each node is considered the start node for the \(k\) number of random walks. This results in \((N.k)\) random walks in total, where \(N\) is number of nodes in graph. Moreover, every random walk is carried out for a predetermined length, \(l\).

Two approaches based on random walk technique are described below: