# Introduction to ImitationFlows

Dynamic Systems represent the time evolution of a point in a geometrical space. They can model apple’s motion falling from a tree, the periodic swings of a pendulum or a robot’s motion playing table tennis.

An Important property of Dynamic Systems is the Stability. The Stability Theory studies how the trajectories generated by a dynamic system diverges under perturbations. Given a globally stable dynamic system, a set of trajectories with different initial states will evolve towards the same target.

Stability guarantee is of particular interest in Robotics. If we think in most of the motions the robots perform, they could be thought as trajectories sampled from a stable dynamic system. Tasks like peg-in-a-hole, pouring or opening a door can be modeled as stable nonlinear dynamic systems w.r.t. a certain target pose. Other type of robot policies, like walking or juggling policies, can be modeled as stable limit cycles. If we are able to ensure that our robot policies are stable, we can avoid undesirable situations such as robots going out of their joint limits or unmeaningful stability regions. Moreover, in policy learning scenarios, imposing stability in our policies could help us to reduce the family of possible solutions to a set of stable policies and could help us to have a safer exploration.

Therefore, wondering which policy architectures could help us in the learning of stable nonlinear dynamic systems was an important question for my research and I decided to follow this path to find the architecture I am presenting in my first blog post.

In the following, I will show that the combination of Normalizing Flow’s invertible neural networks and linear stable dynamics allows to represent highly nonlinear stable dynamics that could be applied as robot policies.

## Linear Dynamics & Stability

Linear Dynamic Systems consider a linear function between the state and the time derivative of the state

\begin{align} \dot{z} = - A z \label{eq:linear} \end{align}

Given their simplicity, the stability of this type of dynamics can be easily studied. As long as the real part of the eigenvalues of the A matrix are positive, $\mathbf{R}_e (\lambda_A) >0$; then, the dynamics are going to be globally asymptotically stable.

Stability can be intuitively understand if we represent the dynamics as the gradient descent on a energy function $V(\cdot)$

$\dot{z} = - \frac{\partial V(z)}{\partial z}.$

When the energy function is quadratic, $V(z) = \frac{1}{2} z A z^{\intercal}$, we recover the linear dynamics in Eq. \eqref{eq:linear}. Given the energy-based dynamics representation, stability will be guaranteed as long as $V(x)$ is a convex function.

Given our dynamics are defined by the gradient descent in a convex energy function, for any starting state, our trajectories will evolve towards the same target state.

Closing the circle, a quadratic function $V(z) = \frac{1}{2} z A z^{\intercal}$ is convex as long as $A$ is positive definite and so, $\mathbf{R}_e (\lambda_A) > 0$.

We are interested in learning Nonlinear Stable Dynamics. However, until now, we are only able to represent linear stable dynamics. From now on, I will try to show how introducing a diffeomorphic mapping between two spaces, we can remain globally stable and also, represent nonlinear dynamics.

## Diffeomorphism

A diffeomorphic function, $f(\cdot)$, is an invertible function that maps smoothly two manifolds $\mathbf{Y}$, $\mathbf{Z}$. Given $y \in \mathbb{R}^{d}$ and $z \in \mathbb{R}^{d}$, a differentiable function, $f(\cdot)$, is a diffeomorphism if is bijective

$y = f(z) \hspace{.5cm},\hspace{.5cm} z = f^{-1}(y)$

and the inverse function $f^{-1}(\cdot)$ is also differentiable.

A diffeomorphism can be thought as a deformation in a elastic space. The transformation is bijective, thus there is a one-to-one mapping between each point of both spaces.

Given we have an energy function, $V: \mathbb{R}^{d}\rightarrow{} \mathbb{R}$ in the manifold $\mathbf{Z}$ and a diffeomorphism $f: \mathbf{Y}\rightarrow{}\mathbf{Z}$; the energy function in the manifold $Y$ can be represented in terms of $V$ and $f$

$U(y) = V(f^{-1}(y))$

An intuitive understanding on how the energy function in manifold $Y$ is related to the energy function in manifold $\mathbf{Z}$ is imagining the manifold $\mathbf{Z}$ as a deformable space in which the energy function lies. The diffeomorphism, $f$, will apply some deformation in the manifold $\mathbf{Z}$, such that $U(y) = V(z)$.

Given a diffeomorphism between two spaces, if an energy function is convex in one space, it will remain convex in the other. Thus, the dynamics in the deformed space will inherit the stability properties of the latent dynamics.

This intuition gives us the clue of the Stability guarantees. Given the diffeomorphism can be thought as a deformation in the space, if the energy function in $\mathbf{Z}$ is a convex, the energy function in $\mathbf{Y}$ will remain convex, and so, globally asymptotically stable. In our work, we properly formulated the stability guarantees in terms of Lyapunov Stability.

Moreover, We can represent the dynamic system in $\mathbf{Y}$ in terms of $\mathbf{Z}$

$\dot{y} = \frac{d y}{dt} = \frac{d f(z)}{dt} = \frac{d f(z)}{d z} \frac{d z}{dt} = J(z) \dot{z} = -J(z) \frac{\partial V(z)}{\partial z}$

### Normalizing Flows as a parameterized diffeomorphism

Normalizing Flows are a set of generative models composed of a latent distribution $p(z)$ from which it is easy to sample, usually a normal distribution, and a learnable diffeomorphism $f(\cdot)$, that maps latent space $\mathbf{Z}$ to observation space $\mathbf{Y}$ \begin{align} y = f(z)\hspace{.5cm},\hspace{.5cm} z \sim p(z) \end{align} By the use of change of variable rule, the density on the $Y$ space can be computed in terms of the density in $Z$ space \begin{align} p(y) = p(z) \left|\textrm{det} \frac{\partial f}{\partial z} \right|^{-1} \label{eq:density_nf} \end{align} The main research line in the Normalizing Flows is in the architectures for the learnable diffeomorphisms, grouped under the name of Invertible Neural Networks. In the context of Normalizing Flows, these networks had been used for learning distributions in static data; nonetheless, we can easily extend the model to learn distributions in trajectories. If we switch the latent distribution by a stochastic transition function $p(z_{k} | z_{k-1})$, we can represent the transition function in $\mathbf{Y}$, \begin{align} p(y_{k}|y_{k-1}) = p(z_{k}|z_{k-1}) \left|\textrm{det} \frac{\partial f}{\partial z_{k}} \right|^{-1}, \label{eq:dynamic_flow} \end{align} in terms of the latent stochastic dynamics $p(z_{k}|z_{k-1})$ and a parameterized diffeomorphism $f$.

Moreover, if the stochastic dynamics in $\mathbf{Z}$ are choosen to be linear and the initial distribution is gaussian $p(z_0) = \mathcal{N}(\mu_0, \Sigma_0)$, we can propagate $k$ steps the distribution of the state as a normal distribution

$p(z_k) = \mathcal{N}(A^{k-1} \mu_{0}, \sum_{i=0}^{k} A^{i} \Sigma_{k-i} A^{i \intercal})$

and do exact inference of $p(y_{k})$ even if is not a normal distribution.

In our work, we used the model in Eq.\eqref{eq:dynamic_flow} for Imitation Learning. Given a set of trajectory demonstrations $\mathcal{D}_{\tau}=( \tau_0 , \tau_1, \dots , \tau_N )$, where each trajectory $\tau_i$ has $T_i$ steps, the Imitation Learning problem can be formulated as an MLE problem

\begin{align} \theta^{*} = \arg \max_{\theta} p(\mathcal{D}_{\tau} ; \theta) \end{align}

where,

$p( \mathcal{D}_{\tau} ; \theta ) = \prod_{i=0}^{k} p(\tau_i ; \theta ) = \prod_{i=0}^{k} p(y_{0}^{i};\theta) \prod_{t=0}^{T_i} p(y_{t+1}^{i}|y_{t}^{i} ; \theta).$

Selecting Eq.\eqref{eq:dynamic_flow} transition probability model and choosing latent dynamics as linear stable stochastic dynamics, we were able to learn the dynamics of a set of trajectory demonstrations, while remaining stable.

#### Citation

If you want to work with ImitationFlows, you can cite us:

@article{urain2020imitationflows,
title={ImitationFlows: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows},
author={Urain, Julen and Ginesi, Michele and Tateo, Davide and Peters, Jan},
journal={IEEE/RSJ International Conference on Intelligent Robots and Systems},
year={2020}
}