bayesml.metatree package#

_images/metatree_example1.png _images/metatree_example2.png

Module contents#

The stochastic data generative model is as follows:

  • \(\boldsymbol{x}=[x_1, \ldots, x_p, x_{p+1}, \ldots , x_{p+q}]\) : an explanatory variable. The first \(p\) variables are continuous. The other \(q\) variables are categorical.

  • \(\mathcal{Y}\) : a space of an objective variable

  • \(y \in \mathcal{Y}\) : an objective variable

  • \(D_\mathrm{max} \in \mathbb{N}\) : the maximum depth of trees

  • \(T\) : a tree whose depth is smaller than or equal to \(D_\mathrm{max}\)

  • \(\mathcal{T}\) : a set of \(T\)

  • \(s\) : a node of a tree

  • \(\mathcal{S}\) : a set of \(s\)

  • \(\mathcal{I}(T)\) : a set of inner nodes of \(T\)

  • \(\mathcal{L}(T)\) : a set of leaf nodes of \(T\)

  • \(\boldsymbol{k}=(k_s)_{s \in \mathcal{S}}\) : feature assignment vector where \(k_s \in \{1, 2,\ldots,p+q\}\). If \(k_s \leq p\), the node \(s\) has a threshold.

  • \(\boldsymbol{\theta}=(\theta_s)_{s \in \mathcal{S}}\) : a set of parameter

  • \(s(\boldsymbol{x}) \in \mathcal{L}(T)\) : a leaf node of \(T\) corresponding to \(\boldsymbol{x}\), which is determined according to \(\boldsymbol{k}\) and the thresholds.

\[p(y | \boldsymbol{x}, \boldsymbol{\theta}, T, \boldsymbol{k})=p(y | \theta_{s(\boldsymbol{x})})\]

The prior distribution is as follows:

  • \(g_{0,s} \in [0,1]\) : a hyperparameter assigned to \(s \in \mathcal{S}\)

  • \(M_{T, \boldsymbol{k}}\) : a meta-tree for \((T, \boldsymbol{k})\)

  • \(\mathcal{T}_{M_{T, \boldsymbol{k}}}\) : a set of \(T\) represented by a meta-tree \(M_{T, \boldsymbol{k}}\)

  • \(B \in \mathbb{N}\) : the number of meta-trees

  • \(\mathcal{M}=\{(T_1, \boldsymbol{k}_1), (T_2, \boldsymbol{k}_2), \ldots, (T_B, \boldsymbol{k}_B) \}\) for \(B\) meta-trees \(M_{T_1, \boldsymbol{k}_1}, M_{T_2, \boldsymbol{k}_2}, \dots, M_{T_B, \boldsymbol{k}_B}\). (These meta-trees must be given beforehand by some method, e.g., constructed from bootstrap samples similar to the random forest.)

For \(T' \in M_{T, \boldsymbol{k}}\),

\[p(T')=\prod_{s \in \mathcal{I}(T')} g_{0,s} \prod_{s' \in \mathcal{L}(T')} (1-g_{0,s'}),\]

where \(g_{0,s}=0\) for a leaf node \(s\) of a meta-tree \(M_{T, \boldsymbol{k}}\).

For \(\boldsymbol{k}_b \in \{\boldsymbol{k}_1, \boldsymbol{k}_2, \ldots, \boldsymbol{k}_B \}\),

\[p(\boldsymbol{k}_b) = \frac{1}{B}.\]

The posterior distribution is as follows:

  • \(n \in \mathbb{N}\) : a sample size

  • \(\boldsymbol{x}^n = \{ \boldsymbol{x}_1, \boldsymbol{x}_2, \ldots, \boldsymbol{x}_n \}\)

  • \(y^n = \{ y_1, y_2, \ldots, y_n \}\)

  • \(g_{n,s} \in [0,1]\) : a hyperparameter

For \(T' \in M_{T, \boldsymbol{k}}\),

\[p(T' | \boldsymbol{x}^n, y^n, \boldsymbol{k})=\prod_{s \in \mathcal{I}(T')} g_{n,s} \prod_{s' \in \mathcal{L}(T')} (1-g_{n,s'}),\]

where the updating rules of the hyperparameter are as follows.

\[\begin{split}g_{i,s} = \begin{cases} g_{0,s} & (i = 0),\\ \frac{g_{i-1,s} \tilde{q}_{s_{\mathrm{child}}}(y_i | \boldsymbol{x}_i, \boldsymbol{x}^{i-1}, y^{i-1}, M_{T, \boldsymbol{k}})}{\tilde{q}_s(y_i | \boldsymbol{x}_i, \boldsymbol{x}^{i-1}, y^{i-1}, M_{T, \boldsymbol{k}})} &(\mathrm{otherwise}), \end{cases}\end{split}\]

where \(s_{\mathrm{child}}\) is the child node of \(s\) on the path corresponding to \(\boldsymbol{x}_{i}\) in \(M_{T, \boldsymbol{k}}\) and

\[\begin{split}&\tilde{q}_s(y_{i} | \boldsymbol{x}_{i}, \boldsymbol{x}^{i-1}, y^{i-1}, M_{T, \boldsymbol{k}}) \\ &= \begin{cases} q_s(y_{i} | \boldsymbol{x}_{i}, \boldsymbol{x}^{i-1}, y^{i-1}, \boldsymbol{k}),& (s \ {\rm is \ the \ leaf \ node \ of} \ M_{T, \boldsymbol{k}}),\\ (1-g_{i-1,s}) q_s(y_{i} | \boldsymbol{x}_{i}, \boldsymbol{x}^{i-1}, y^{i-1}, \boldsymbol{k}) \\ \qquad + g_{i-1,s} \tilde{q}_{s_{\mathrm{child}}}(y_{i} | \boldsymbol{x}_{i}, \boldsymbol{x}^{i-1}, y^{i-1}, M_{T, \boldsymbol{k}}),& ({\rm otherwise}), \end{cases}\end{split}\]
\[q_s(y_{i} | \boldsymbol{x}_{i}, \boldsymbol{x}^{i-1}, y^{i-1}, \boldsymbol{k})=\int p(y_i | \boldsymbol{x}_i, \boldsymbol{\theta}, T, \boldsymbol{k}) p(\boldsymbol{\theta} | \boldsymbol{x}^{i-1}, y^{i-1}, T, \boldsymbol{k}) \mathrm{d} \boldsymbol{\theta}, \quad s \in \mathcal{L}(T)\]

For \(\boldsymbol{k}_b \in \{\boldsymbol{k}_1, \boldsymbol{k}_2, \ldots, \boldsymbol{k}_B \}\),

\[p(\boldsymbol{k}_b | \boldsymbol{x}^n, y^n)\propto \prod_{i=1}^n \tilde{q}_{s_{\lambda}}(y_{i}|\boldsymbol{x}_{i},\boldsymbol{x}^{i-1}, y^{i-1}, M_{T_b, \boldsymbol{k}_b}),\]

where \(s_{\lambda}\) is the root node of \(M_{T_b, \boldsymbol{k}_b}\).

The predictive distribution is as follows:

\[p(y_{n+1}| \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n) = \sum_{b = 1}^B p(\boldsymbol{k}_b | \boldsymbol{x}^n, y^n) \tilde{q}_{s_{\lambda}}(y_{n+1}|\boldsymbol{x}_{n+1},\boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b})\]

The expectation of the predictive distribution can be calculated as follows.

\[\mathbb{E}_{p(y_{n+1}| \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n)} [Y_{n+1}| \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n] = \sum_{b = 1}^B p(\boldsymbol{k}_b | \boldsymbol{x}^n, y^n) \mathbb{E}_{\tilde{q}_{s_{\lambda}}(y_{n+1}|\boldsymbol{x}_{n+1},\boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b})} [Y_{n+1}| \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b],\]

where the expectation for \(\tilde{q}\) is recursively given as follows.

\[\begin{split}&\mathbb{E}_{\tilde{q}_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b})} [Y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b] \\ &= \begin{cases} \mathbb{E}_{q_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b)} [Y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b],& (s \ {\rm is \ the \ leaf \ node \ of} \ M_{T_b, \boldsymbol{k}_b}),\\ (1-g_{n,s}) \mathbb{E}_{q_s(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b)} [Y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b] \\ \qquad + g_{n,s} \mathbb{E}_{\tilde{q}_{s_{\mathrm{child}}}(y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, M_{T_b, \boldsymbol{k}_b})} [Y_{n+1} | \boldsymbol{x}_{n+1}, \boldsymbol{x}^n, y^n, \boldsymbol{k}_b] ,& ({\rm otherwise}). \end{cases}\end{split}\]

References

  • Dobashi, N.; Saito, S.; Nakahara, Y.; Matsushima, T. Meta-Tree Random Forest: Probabilistic Data-Generative Model and Bayes Optimal Prediction. Entropy 2021, 23, 768. https://doi.org/10.3390/e23060768

  • Nakahara, Y.; Saito, S.; Kamatsuka, A.; Matsushima, T. Probability Distribution on Full Rooted Trees. Entropy 2022, 24, 328. https://doi.org/10.3390/e24030328

class bayesml.metatree.GenModel(c_dim_continuous, c_dim_categorical, c_max_depth=2, c_num_children_vec=2, c_num_assignment_vec=None, c_ranges=None, SubModel=<module 'bayesml.bernoulli' from 'C:\\Users\\nakahara\\Documents\\GitHub\\BayesML\\bayesml\\bernoulli\\__init__.py'>, sub_constants={}, root=None, h_k_weight_vec=None, h_g=0.5, sub_h_params={}, h_metatree_list=[], h_metatree_prob_vec=None, seed=None)#

Bases: Generative

The stochastice data generative model and the prior distribution

Parameters:
c_dim_continuousint

A non-negative integer

c_dim_categoricalint

A non-negative integer

c_num_children_vecnumpy.ndarray, optional

A vector of positive integers whose length is c_dim_continuous+c_dim_categorical, by default [2,2,…,2]. The first c_dim_continuous elements represent the numbers of children of continuous features at inner nodes. The other c_dim_categorial elements represent those of categorical features. If a single integer is input, it will be broadcasted.

c_max_depthint, optional

A positive integer, by default 2

c_num_assignment_vecnumpy.ndarray, optional

A vector of positive integers whose length is c_dim_continuous+c_dim_categorical. The first c_dim_continuous elements represent the maximum assignment numbers of continuous features on a path. The other c_dim_categorial elements represent those of categorical features. If it has a negative element (e.g., -1), the corresponding feature will be assigned any number of times. By default [c_max_depth,…,c_max_depth,1,…,1].

c_rangesnumpy.ndarray, optional

A numpy.ndarray whose size is (c_dim_continuous,2). A threshold for the k-th continuous feature will be generated between c_ranges[k,0] and c_ranges[k,1]. By default, [[-3,3],[-3,3],…,[-3,3]].

SubModelclass, optional

bernoulli, categorical, poisson, normal, exponential, or linearregression, by default bernoulli

sub_constantsdict, optional

constants for self.SubModel.GenModel, by default {}

rootmetatree._Node, optional

A root node of a meta-tree, by default a tree consists of only one node.

h_k_weight_vecnumpy.ndarray, optional

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical, by default [1,…,1].

h_gfloat, optional

A real number in \([0, 1]\), by default 0.5

sub_h_paramsdict, optional

h_params for self.SubModel.GenModel, by default {}

h_metatree_listlist of metatree._Node, optional

Root nodes of meta-trees, by default []

h_metatree_prob_vecnumpy.ndarray, optional

A vector of real numbers in \([0, 1]\) that represents prior distribution of h_metatree_list, by default uniform distribution Sum of its elements must be 1.0.

seed{None, int}, optional

A seed to initialize numpy.random.default_rng(), by default None

Attributes:
c_dim_features: int

c_dim_continuous + c_dim_categorical

Methods

gen_params([feature_fix, threshold_fix, ...])

Generate the parameter from the prior distribution.

gen_sample([sample_size, x_continuous, ...])

Generate a sample from the stochastic data generative model.

get_constants()

Get constants of GenModel.

get_h_params()

Get the hyperparameters of the prior distribution.

get_params()

Get the parameter of the sthocastic data generative model.

load_h_params(filename)

Load the hyperparameters to h_params.

load_params(filename)

Load the parameters saved by save_params.

save_h_params(filename)

Save the hyperparameters using python pickle module.

save_params(filename)

Save the parameters using python pickle module.

save_sample(filename, sample_size[, x])

Save the generated sample as NumPy .npz format.

set_h_params([h_k_weight_vec, h_g, ...])

Set the hyperparameters of the prior distribution.

set_params([root])

Set the parameter of the sthocastic data generative model.

visualize_model([filename, format, ...])

Visualize the stochastic data generative model and generated samples.

get_constants()#

Get constants of GenModel.

Returns:
constantsdict of {str: int, numpy.ndarray}
  • "c_dim_continuous" : the value of self.c_dim_continuous

  • "c_dim_categorical" : the value of self.c_dim_categorical

  • "c_num_children_vec" : the value of self.c_num_children_vec

  • "c_max_depth" : the value of self.c_max_depth

  • "c_num_assignment_vec" : the value of self.c_num_assignment_vec

  • "c_ranges" : the value of self.c_ranges

set_h_params(h_k_weight_vec=None, h_g=None, sub_h_params=None, h_metatree_list=None, h_metatree_prob_vec=None)#

Set the hyperparameters of the prior distribution.

Parameters:
h_k_weight_vecnumpy.ndarray, optional

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical, by default None.

h_gfloat, optional

A real number in \([0, 1]\), by default None

sub_h_paramsdict, optional

h_params for self.SubModel.GenModel, by default None

h_metatree_listlist of metatree._Node, optional

Root nodes of meta-trees, by default None

h_metatree_prob_vecnumpy.ndarray, optional

A vector of real numbers in \([0, 1]\) that represents prior distribution of h_metatree_list, by default None. Sum of its elements must be 1.0.

get_h_params()#

Get the hyperparameters of the prior distribution.

Returns:
h_paramsdict of {str: float, list, dict, numpy.ndarray}
  • "h_k_weight_vec" : the value of self.h_k_weight_vec

  • "h_g" : the value of self.h_g

  • "sub_h_params" : the value of self.sub_h_params

  • "h_metatree_list" : the value of self.h_metatree_list

  • "h_metatree_prob_vec" : the value of self.h_metatree_prob_vec

gen_params(feature_fix=False, threshold_fix=False, tree_fix=False, threshold_type='even')#

Generate the parameter from the prior distribution.

The generated vaule is set at self.root.

Parameters:
feature_fixbool, optional

If True, feature assignment indices will be fixed, by default False.

threshold_fixbool, optional

If True, thresholds for continuous features will be fixed, by default False. If feature_fix is False, threshold_fix must be False.

tree_fixbool, optional

If True, tree shape will be fixed, by default False. If feature_fix is False, tree_fix must be False.

threshold_type{‘even’, ‘random’}, optional

A type of threshold generating procedure, by default 'even' If 'even', self.c_ranges will be recursively divided by equal intervals. if 'random', self.c_ranges will be recursively divided by at random intervals.

set_params(root=None)#

Set the parameter of the sthocastic data generative model.

Parameters:
rootmetatree._Node, optional

A root node of a meta-tree, by default None.

get_params()#

Get the parameter of the sthocastic data generative model.

Returns:
paramsdict of {str:metatree._Node}
  • "root" : The value of self.root.

gen_sample(sample_size=None, x_continuous=None, x_categorical=None)#

Generate a sample from the stochastic data generative model.

Parameters:
sample_sizeint, optional

A positive integer, by default None

x_continuousnumpy ndarray, optional

2 dimensional float array whose size is (sample_size,c_dim_continuous), by default None.

x_categoricalnumpy ndarray, optional

2 dimensional int array whose size is (sample_size,c_dim_categorical), by default None. Each element x_categorical[i,j] must satisfy 0 <= x_categorical[i,j] < self.c_num_children_vec[self.c_dim_continuous+i].

Returns:
x_continuousnumpy ndarray

2 dimensional float array whose size is (sample_size,c_dim_continuous).

x_categoricalnumpy ndarray, optional

2 dimensional int array whose size is (sample_size,c_dim_categorical). Each element x_categorical[i,j] must satisfies 0 <= x_categorical[i,j] < self.c_num_children_vec[self.c_dim_continuous+i].

ynumpy ndarray

1 dimensional array whose size is sample_size.

save_sample(filename, sample_size, x=None)#

Save the generated sample as NumPy .npz format.

It is saved as a NpzFile with keyword: “x”.

Parameters:
filenamestr

The filename to which the sample is saved. .npz will be appended if it isn’t there.

sample_sizeint, optional

A positive integer, by default None

x_continuousnumpy ndarray, optional

2 dimensional float array whose size is (sample_size,c_dim_continuous), by default None.

x_categoricalnumpy ndarray, optional

2 dimensional int array whose size is (sample_size,c_dim_categorical), by default None. Each element x_categorical[i,j] must satisfy 0 <= x_categorical[i,j] < self.c_num_children_vec[self.c_dim_continuous+i].

visualize_model(filename=None, format=None, sample_size=100, x_continuous=None, x_categorical=None)#

Visualize the stochastic data generative model and generated samples.

Note that values of categorical features will be shown with jitters.

Parameters:
filenamestr, optional

Filename for saving the figure, by default None

formatstr, optional

Rendering output format ("pdf", "png", …).

sample_sizeint, optional

A positive integer, by default 100

x_continuousnumpy ndarray, optional

2 dimensional float array whose size is (sample_size,c_dim_continuous), by default None.

x_categoricalnumpy ndarray, optional

2 dimensional int array whose size is (sample_size,c_dim_categorical), by default None. Each element x_categorical[i,j] must satisfy 0 <= x_categorical[i,j] < self.c_num_children_vec[self.c_dim_continuous+i].

See also

graphviz.Digraph

Examples

>>> from bayesml import metatree
>>> model = metatree.GenModel(
>>>     c_dim_continuous=1,
>>>     c_dim_categorical=1)
>>> model.gen_params(threshold_type='random')
>>> model.visualize_model()
_images/metatree_example1.png _images/metatree_example2.png
class bayesml.metatree.LearnModel(c_dim_continuous, c_dim_categorical, c_num_children_vec=2, c_max_depth=2, c_num_assignment_vec=None, c_ranges=None, SubModel=<module 'bayesml.bernoulli' from 'C:\\Users\\nakahara\\Documents\\GitHub\\BayesML\\bayesml\\bernoulli\\__init__.py'>, sub_constants={}, h0_k_weight_vec=None, h0_g=0.5, sub_h0_params={}, h0_metatree_list=[], h0_metatree_prob_vec=None)#

Bases: Posterior, PredictiveMixin

The posterior distribution and the predictive distribution.

Parameters:
c_dim_continuousint

A non-negative integer

c_dim_categoricalint

A non-negative integer

c_num_children_vecnumpy.ndarray, optional

A vector of positive integers whose length is c_dim_continuous+c_dim_categorical, by default [2,2,…,2]. The first c_dim_continuous elements represent the numbers of children of continuous features at inner nodes. The other c_dim_categorial elements represent those of categorical features. If a single integer is input, it will be broadcasted.

c_max_depthint, optional

A positive integer, by default 2

c_num_assignment_vecnumpy.ndarray, optional

A vector of positive integers whose length is c_dim_continuous+c_dim_categorical. The first c_dim_continuous elements represent the maximum assignment numbers of continuous features on a path. The other c_dim_categorial elements represent those of categorical features. If it has a negative element (e.g., -1), the corresponding feature will be assigned any number of times. By default [c_max_depth,…,c_max_depth,1,…,1].

c_rangesnumpy.ndarray, optional

A numpy.ndarray whose size is (c_dim_continuous,2). A threshold for the k-th continuous feature will be generated between c_ranges[k,0] and c_ranges[k,1]. By default, [[-3,3],[-3,3],…,[-3,3]].

SubModelclass, optional

bernoulli, categorical, poisson, normal, exponential, or linearregression, by default bernoulli

sub_constantsdict, optional

constants for self.SubModel.LearnModel, by default {}

h0_k_weight_vecnumpy.ndarray, optional

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical, by default [1,…,1].

h0_gfloat, optional

A real number in \([0, 1]\), by default 0.5

sub_h0_paramsdict, optional

h0_params for self.SubModel.LearnModel, by default {}

h0_metatree_listlist of metatree._Node, optional

Root nodes of meta-trees, by default []

h0_metatree_prob_vecnumpy.ndarray, optional

A vector of real numbers in \([0, 1]\) that represents prior distribution of h0_metatree_list, by default uniform distribution Sum of its elements must be 1.0.

Attributes:
c_dim_features: int

c_dim_continuous + c_dim_categorical

hn_k_weight_vecnumpy.ndarray

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical

hn_gfloat

A real number in \([0, 1]\)

sub_hn_paramsdict

hn_params for self.SubModel.LearnModel

hn_metatree_listlist of metatree._Node

Root nodes of meta-trees

hn_metatree_prob_vecnumpy.ndarray

A vector of real numbers in \([0, 1]\) that represents prior distribution of h0_metatree_list. Sum of its elements is 1.0.

Methods

calc_pred_dist([x_continuous, x_categorical])

Calculate the parameters of the predictive distribution.

estimate_params([loss, visualize, filename, ...])

Estimate the parameter under the given criterion.

get_constants()

Get constants of LearnModel.

get_h0_params()

Get the hyperparameters of the prior distribution.

get_hn_params()

Get the hyperparameters of the posterior distribution.

get_p_params()

Get the parameters of the predictive distribution.

load_h0_params(filename)

Load the hyperparameters to h0_params.

load_hn_params(filename)

Load the hyperparameters to hn_params.

make_prediction([loss])

Predict a new data point under the given criterion.

overwrite_h0_params()

Overwrite the initial values of the hyperparameters of the posterior distribution by the learned values.

pred_and_update([x_continuous, ...])

Predict a new data point and update the posterior sequentially.

reset_hn_params()

Reset the hyperparameters of the posterior distribution to their initial values.

save_h0_params(filename)

Save the hyperparameters using python pickle module.

save_hn_params(filename)

Save the hyperparameters using python pickle module.

set_h0_params([h0_k_weight_vec, h0_g, ...])

Set the hyperparameters of the prior distribution.

set_hn_params([hn_k_weight_vec, hn_g, ...])

Set the hyperparameters of the posterior distribution.

update_posterior([x_continuous, ...])

Update the hyperparameters of the posterior distribution using traning data.

visualize_posterior([filename, format, ...])

Visualize the posterior distribution for the parameter.

get_constants()#

Get constants of LearnModel.

Returns:
constantsdict of {str: int, numpy.ndarray}
  • "c_dim_continuous" : the value of self.c_dim_continuous

  • "c_dim_categorical" : the value of self.c_dim_categorical

  • "c_num_children_vec" : the value of self.c_num_children_vec

  • "c_max_depth" : the value of self.c_max_depth

  • "c_num_assignment_vec" : the value of self.c_num_assignment_vec

  • "c_ranges" : the value of self.c_ranges

set_h0_params(h0_k_weight_vec=None, h0_g=None, sub_h0_params=None, h0_metatree_list=None, h0_metatree_prob_vec=None)#

Set the hyperparameters of the prior distribution.

Parameters:
h0_k_weight_vecnumpy.ndarray, optional

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical, by default None.

h0_gfloat, optional

A real number in \([0, 1]\), by default None

sub_h0_paramsdict, optional

h0_params for self.SubModel.LearnModel, by default None

h0_metatree_listlist of metatree._Node, optional

Root nodes of meta-trees, by default None

h0_metatree_prob_vecnumpy.ndarray, optional

A vector of real numbers in \([0, 1]\) that represents prior distribution of h0_metatree_list, by default None. Sum of its elements must be 1.0.

get_h0_params()#

Get the hyperparameters of the prior distribution.

Returns:
h0_paramsdict of {str: float, list, dict, numpy.ndarray}
  • "h0_k_weight_vec" : the value of self.h0_k_weight_vec

  • "h0_g" : the value of self.h0_g

  • "sub_h0_params" : the value of self.sub_h0_params

  • "h0_metatree_list" : the value of self.h0_metatree_list

  • "h0_metatree_prob_vec" : the value of self.h0_metatree_prob_vec

set_hn_params(hn_k_weight_vec=None, hn_g=None, sub_hn_params=None, hn_metatree_list=None, hn_metatree_prob_vec=None)#

Set the hyperparameters of the posterior distribution.

Parameters:
hn_k_weight_vecnumpy.ndarray, optional

A vector of positive real numbers whose length is c_dim_continuous+c_dim_categorical, by default None.

hn_gfloat, optional

A real number in \([0, 1]\), by default None

sub_hn_paramsdict, optional

hn_params for self.SubModel.LearnModel, by default None

hn_metatree_listlist of metatree._Node, optional

Root nodes of meta-trees, by default None

hn_metatree_prob_vecnumpy.ndarray, optional

A vector of real numbers in \([0, 1]\) that represents prior distribution of hn_metatree_list, by default None. Sum of its elements must be 1.0.

get_hn_params()#

Get the hyperparameters of the posterior distribution.

Returns:
hn_paramsdict of {str: float, list, dict, numpy.ndarray}
  • "hn_k_weight_vec" : the value of self.hn_k_weight_vec

  • "hn_g" : the value of self.hn_g

  • "sub_hn_params" : the value of self.sub_hn_params

  • "hn_metatree_list" : the value of self.hn_metatree_list

  • "hn_metatree_prob_vec" : the value of self.hn_metatree_prob_vec

update_posterior(x_continuous=None, x_categorical=None, y=None, alg_type='MTRF', **kwargs)#

Update the hyperparameters of the posterior distribution using traning data.

Parameters:
x_continuousnumpy ndarray, optional

2 dimensional float array whose size is (sample_size,c_dim_continuous), by default None.

x_categoricalnumpy ndarray, optional

2 dimensional int array whose size is (sample_size,c_dim_categorical), by default None. Each element x_categorical[i,j] must satisfy 0 <= x_categorical[i,j] < self.c_num_children_vec[self.c_dim_continuous+i].

ynumpy ndarray

values of objective variable whose dtype may be int or float

alg_type{‘MTRF’, ‘given_MT’}, optional

type of algorithm, by default ‘MTRF’

**kwargsdict, optional

optional parameters of algorithms, by default {}

estimate_params(loss='0-1', visualize=True, filename=None, format=None)#

Estimate the parameter under the given criterion.

The approximate MAP meta-tree \(M_{T,\boldsymbol{k}_b} = \mathrm{argmax} p(M_{T,\boldsymbol{k}_{b'}} | \boldsymbol{x}^n, y^n)\) will be returned.

Parameters:
lossstr, optional

Loss function underlying the Bayes risk function, by default "0-1". This function supports only "0-1".

visualizebool, optional

If True, the estimated metatree will be visualized, by default True. This visualization requires graphviz.

filenamestr, optional

Filename for saving the figure, by default None

formatstr, optional

Rendering output format ("pdf", "png", …).

Returns:
map_rootmetatree._Node

The root node of the estimated meta-tree that also contains the estimated parameters in each node.

Warning

Multiple metatrees can represent equivalent model classes. This function does not take such duplication into account.

See also

graphviz.Digraph
visualize_posterior(filename=None, format=None, num_metatrees=3, h_params=False)#

Visualize the posterior distribution for the parameter.

This method requires graphviz.

Parameters:
filenamestr, optional

Filename for saving the figure, by default None

formatstr, optional

Rendering output format ("pdf", "png", …).

num_metatreesint, optional

Number of metatrees to be visualized, by default 3.

h_paramsbool, optional

If True, hyperparameters at each node will be visualized. if False, estimated parameters at each node will be visulaized.

See also

graphviz.Digraph

Examples

>>> from bayesml import metatree
>>> gen_model = metatree.GenModel(
>>>     c_dim_continuous=1,
>>>     c_dim_categorical=1)
>>> gen_model.gen_params(threshold_type='random')
>>> x_continuous,x_categorical,y = gen_model.gen_sample(200)
>>> learn_model = metatree.LearnModel(
>>>     c_dim_continuous=1,
>>>     c_dim_categorical=1)
>>> learn_model.update_posterior(x_continuous,x_categorical,y)
>>> learn_model.visualize_posterior(num_metatrees=2)
_images/metatree_posterior2.png
get_p_params()#

Get the parameters of the predictive distribution.

This model does not have a simple parametric expression of the predictive distribution. Therefore, this function returns None.

Returns:
None
calc_pred_dist(x_continuous=None, x_categorical=None)#

Calculate the parameters of the predictive distribution.

Parameters:
x_continuousnumpy ndarray, optional

A float vector whose length is self.c_dim_continuous, by default None.

x_categoricalnumpy ndarray, optional

A int vector whose length is self.c_dim_categorical, by default None. Each element x_categorical[i] must satisfy 0 <= x_categorical[i] < self.c_num_children_vec[self.c_dim_continuous+i].

make_prediction(loss='squared')#

Predict a new data point under the given criterion.

Parameters:
lossstr, optional

Loss function underlying the Bayes risk function, by default “squared”. This function supports “squared”, “0-1”, and “KL”.

Returns:
predicted_value{float, numpy.ndarray}

The predicted value under the given loss function.

pred_and_update(x_continuous=None, x_categorical=None, y=None, loss='squared')#

Predict a new data point and update the posterior sequentially.

Parameters:
x_continuousnumpy ndarray, optional

A float vector whose length is self.c_dim_continuous, by default None.

x_categoricalnumpy ndarray, optional

A int vector whose length is self.c_dim_categorical, by default None. Each element x_categorical[i] must satisfy 0 <= x_categorical[i] < self.c_num_children_vec[self.c_dim_continuous+i].

ynumpy ndarray

values of objective variable whose dtype may be int or float

lossstr, optional

Loss function underlying the Bayes risk function, by default “squared”. This function supports “squared”, “0-1”, and “KL”.

Returns:
predicted_value{float, numpy.ndarray}

The predicted value under the given loss function.