Example

Requirements

It is recommended make virtualenv and install all next packages in this virtualenv.

torch==1.4.0
numpy==1.18.1
matplotlib==2.2.4
mixturelib==0.2.*

Include packages.

import torch
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

from mixturelib.mixture import MixtureEM
from mixturelib.local_models import EachModelLinear
from mixturelib.hyper_models import HyperExpertNN, HyperModelDirichlet

Preparing the dataset

Generate dataset. This dataset contains two different planes.

np.random.seed(42)

N = 200
noise_component = 0.8
noise_target = 5

X = np.random.randn(N, 2)
X[:N//2, 1] *= noise_component
X[N//2:, 0] *= noise_component

real_first_w = np.array([[10.], [0.]])
real_second_w = np.array([[0.], [30.]])

y = np.vstack([X[:N//2]@real_first_w, X[N//2:]@real_second_w])\
    + noise_target*np.random.randn(N, 1)

Dataset visualisation. Blue color corresponds to one local model and red corresponds to another.

_images/real_surface.png

Convert the dataset into torch.tensor format.

torch.random.manual_seed(42)
X_tr = torch.FloatTensor(X)
Y_tr = torch.FloatTensor(y)

Mixture of Model

Consider an example of a mixture of model. In this case the contribution of each model does not depend on the sample from dataset.

Init local models. We consider linear model mixturelib.local_models.EachModelLinear as local model.

torch.random.manual_seed(42)
first_model = EachModelLinear(input_dim=2)
secode_model = EachModelLinear(input_dim=2)

list_of_models = [first_model, secode_model]

Init hyper model — mixturelib.hyper_models.HyperModelDirichlet. In this case contribution of each model does not depend on sample from dataset. It is suggested that the contribution of each model has a Dirichlet distribution. The hyper model parameters is a parameter of Dirichlet distribution.

HpMd = HyperModelDirichlet(output_dim=2)

Init mixture model. The mixture model is simple function which weighs local models answers. Weights are generated by hyper model HpMd.

mixture = MixtureEM(HyperParameters={'beta': 1.},
                    HyperModel=HpMd,
                    ListOfModels=list_of_models,
                    model_type='sample')

Train mixture model on the give dataset.

mixture.fit(X_tr, Y_tr)

Local models parameters after training procedure. In our task, each model is a simple plane in 3D space.

predicted_first_w = mixture.ListOfModels[0].W.numpy()
predicted_second_w = mixture.ListOfModels[1].W.numpy()

Visualization of the real and predicted planes on the chart.

fig = plt.figure(figsize=(8, 8))

ax = fig.add_subplot(111, projection='3d')

grid_2d = np.array(np.meshgrid(range(-5, 5), range(-5, 5)))
xx, yy = grid_2d

first_z = (predicted_first_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
second_z = (predicted_second_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
ax.plot_surface(xx, yy, first_z, alpha = 0.25, color = 'red', label='predicted')
ax.plot_surface(xx, yy, second_z, alpha = 0.25, color = 'blue', label='predicted')

first_z = (real_first_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
second_z = (real_second_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
ax.plot_wireframe(xx, yy, first_z, linewidth=1, color = 'red')
ax.plot_wireframe(xx, yy, second_z, linewidth=1, color = 'blue')

ax.view_init(20, 170)

ax.set_xlabel('$x_1$', fontsize=15, fontweight="bold")
ax.set_ylabel('$x_2$', fontsize=15, fontweight="bold")
ax.set_zlabel('$y$', fontsize=15, fontweight="bold")

plt.show()

The surfaces with grid are real planes. The surfaces without grid are predicted planes.

_images/real_predicter_surface_mix_mod.png

Mixture of Experts

Now consider an example of a mixture of experts on the same dataset. In this case contribution of each model is depend on sample from dataset.

Init local models. We consider linear model mixturelib.local_models.EachModelLinear as local model.

torch.random.manual_seed(42)
first_model = EachModelLinear(input_dim=2)
secode_model = EachModelLinear(input_dim=2)

list_of_models = [first_model, secode_model]

Init hyper model — gate function mixturelib.hyper_models.HyperExpertNN. In this case contribution of each model is depend on sample from dataset. Gate function is a simple neural network with softmax on the last layer.

HpMd = HyperExpertNN(input_dim=2, hidden_dim=5,
                     output_dim=2, epochs=100)

Init mixture model. The mixture model is simple function which weighs local models answers. Weights are generated by hyper model HpMd.

mixture = MixtureEM(HyperParameters={'beta': 1.},
                    HyperModel=HpMd,
                    ListOfModels=list_of_models,
                    model_type='sample')

Train mixture model on the give dataset.

mixture.fit(X_tr, Y_tr)

Local models parameters after training procedure. In our task, each model is a simple plane in 3D space.

predicted_first_w = mixture.ListOfModels[0].W.numpy()
predicted_second_w = mixture.ListOfModels[1].W.numpy()

Visualization of the real and predicted planes on the chart.

fig = plt.figure(figsize=(8, 8))

ax = fig.add_subplot(111, projection='3d')

grid_2d = np.array(np.meshgrid(range(-5, 5), range(-5, 5)))
xx, yy = grid_2d

first_z = (predicted_first_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
second_z = (predicted_second_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
ax.plot_surface(xx, yy, first_z, alpha = 0.25, color = 'red', label='predicted')
ax.plot_surface(xx, yy, second_z, alpha = 0.25, color = 'blue', label='predicted')

first_z = (real_first_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
second_z = (real_second_w.reshape([-1, 1, 1])*grid_2d).sum(axis=0)
ax.plot_wireframe(xx, yy, first_z, linewidth=1, color = 'red')
ax.plot_wireframe(xx, yy, second_z, linewidth=1, color = 'blue')

ax.view_init(20, 170)

ax.set_xlabel('$x_1$', fontsize=15, fontweight="bold")
ax.set_ylabel('$x_2$', fontsize=15, fontweight="bold")
ax.set_zlabel('$y$', fontsize=15, fontweight="bold")

plt.show()

The surfaces with grid are real planes. The surfaces without grid are predicted planes.

_images/real_predicter_surface_mix_exp.png