You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
168 lines
7.0 KiB
TeX
168 lines
7.0 KiB
TeX
\documentclass[../Main.tex]{subfiles}
|
|
\graphicspath{{\subfix{Assets/img/}}}
|
|
|
|
\begin{document}
|
|
The computational approach I take is based on
|
|
\cite{Maliar2019}'s Bellman Residual Minimization, with the
|
|
policy and value functions are approximated using a neural network.
|
|
In summary the bellman equation is rewritten in the form:
|
|
\begin{align}
|
|
Q = V(S_T,D_t) - F(S_t,D_t,X_t(S_t,D_t)) -\beta V(S_{t+1},D_{t+1})
|
|
\end{align}
|
|
With a policy maximization condition such as:
|
|
\begin{align}
|
|
M = \left[ F(S_t,D_t,X_t(S_t,D_t)) + \beta V(S_{t+1},D_{t+1})\right]
|
|
\end{align}
|
|
|
|
In the deterministic case, a loss function can be constructed in
|
|
either of the following equivalent cases:
|
|
\begin{align}
|
|
\phi_1 = Q^2 - vM \\
|
|
\phi_2 = \left (M - Q - \frac{v}{2}\right)^2 - v \cdot \left(Q + \frac{v}{4}\right)
|
|
\end{align}
|
|
where $v$ is an external weighting parameter which can be cross validated.
|
|
|
|
By choosing a neural network as the functional approximation, we are able to
|
|
use the fact that a NN with a single hidden layer can be used to approximate
|
|
functions arbitrarily well
|
|
under certain conditions \autocite{White1990}.
|
|
We can also
|
|
take advantage of the significant computational and practical improvements
|
|
currently revolutionizing Machine Learning.
|
|
Some examples include the use of specialized hardware and the ability to transfer
|
|
learning between models, both of which can speed up functional approximation.
|
|
|
|
\subsection{Computational Plan}
|
|
The neural network library I've chosen to use is Flux.jl \cite{Innes2018}
|
|
a Neural Network library implmented in and for the Julia language,
|
|
although the Bellman Residual Minimization algorithm would work equally well in
|
|
PyTorch or TensorFlow
|
|
\footnote{
|
|
The initial reason I investigated Flux/Julia is due to the source to source
|
|
Automatic Differentiation capabilities, which I intended to use to implement
|
|
a generic version of \cite{Maliar2019}'s euler equation iteration method.
|
|
While I still believe this is possible and that Flux represents one of the
|
|
best tools available for that specific purpose,
|
|
I've been unsuccessful at implementing the algorithm.
|
|
}.
|
|
Below I note some of the design, training, and implementation decisions.
|
|
|
|
%Data Description
|
|
The data used to train the network is simulated data, pulled from random distributions.
|
|
One advantage of this approach is that by changing the distribution, the emphasis
|
|
in the training changes.
|
|
Initially training can be focused on certain areas of the state space, but later
|
|
training can put the focus on other areas as their importance is recognized.
|
|
In the case that we don't know which data areas to investigate, it is possible to
|
|
optimize over a given dataset, and the iterate stocks and debris forward
|
|
many periods.
|
|
If the debris and stocks don't line up well with the initial training dataset,
|
|
we can change the distribution to cover the stocks and debris from the iteration,
|
|
thus bootstrapping the distribution of the training set.
|
|
|
|
\subsubsection{Constellation Operators}
|
|
%Operators
|
|
% Branched Policy Topology
|
|
% Individual Value functions
|
|
% Training Loop
|
|
Although there are multiple operators, the individual policy functions
|
|
show up jointly as the code is currently implemented.
|
|
For this reason, I've implemented each operator's policy function
|
|
as a ``branch'' within a single neural network.
|
|
These branches are configured such that they each recieve the same
|
|
inputs (stocks and debris), but decisions in each branch are made without reference
|
|
These results are then concatenated together into the final policy vector.
|
|
When training a given operator, the appropriate branch is unfrozen so that operator can train.
|
|
Value functions are implemented as unique neural networks at the constellation operator level,
|
|
much like the operator's bellman residual function.
|
|
|
|
The training loops take the form of:
|
|
|
|
For each epoch
|
|
\begin{enumerate}
|
|
\item generate data
|
|
\item for each operator
|
|
\begin{enumerate}
|
|
\item Unfreeze branch
|
|
\item Train policy function on data
|
|
\item Freeze branch
|
|
\item Train Value function on data
|
|
\end{enumerate}
|
|
\item Check termination conditions
|
|
\end{enumerate}
|
|
|
|
Overall, this allows for each operator's policy and value functions to be approximated
|
|
on it's own bellman residuals, while maintaining a convenient interface.
|
|
|
|
\subsubsection{Planner}
|
|
%Planner
|
|
% policy topology
|
|
% Value function topology
|
|
% Training loop
|
|
|
|
The policy function for the Fleet Planner does not require any separate branches,
|
|
although it could if desired for comparison purposes.
|
|
The key point though, is that no parameter freezing is done during training,
|
|
allowing the repercussions on other constellations to be taken into account.
|
|
Similarly there is a single neural network used to estimate the value function.
|
|
|
|
The training loops take the form of:
|
|
|
|
For each epoch
|
|
\begin{enumerate}
|
|
\item generate data
|
|
\begin{enumerate}
|
|
\item Train policy function on data
|
|
\item Train Value function on data
|
|
\end{enumerate}
|
|
\item Check termination conditions
|
|
\end{enumerate}
|
|
|
|
\subsubsection{Heterogeneous Agents and Nash Equilibria}
|
|
One key question is how to handle the case of heterogeneous agents.
|
|
In the processes outlined above, the heterogeneous agents are simply
|
|
identified by their position in the state and action vectors and
|
|
then the NN learns how to operate with each of them\footnote{
|
|
I believe it may be possible to create some classifications of
|
|
different heterogeneous agent types that allows for simpler function transfers,
|
|
but the implementation will take some extensive code design work.
|
|
}.
|
|
|
|
|
|
When the laws of motion depend on other agents' decisions, the opportunity
|
|
for Nash and other game theoretic equilibria to arise.
|
|
One benefit of using neural networks is that they can find standard equilibrium concepts,
|
|
including mixed nash equilibria if configured properly.
|
|
%concerns about nash computability
|
|
|
|
\subsection{Functional Forms}
|
|
The reference functional forms for the model are similar to those
|
|
given in \cite{RaoRondina2020}.
|
|
\begin{itemize}
|
|
\item The linear per-period benefit function:
|
|
\begin{align}
|
|
u^i(S_t, D_t, X_t) = \pi s^i_t - f \cdot x^i_t
|
|
\end{align}
|
|
\item Each constellation's satellite survival function:
|
|
\begin{align}
|
|
R^i(S_t, D_t) = e^{- d\cdot D_t - \sum^N_{j=1} h^j s^j_t}
|
|
\end{align}
|
|
\end{itemize}
|
|
|
|
\subsubsection{Parameter Values}
|
|
%I'm just guessing.
|
|
Currently, I've not found a way to estimate the proper parameters to use,
|
|
and there needs to be a discussion of how to calibrate those parameters.
|
|
So far, my goal is to choose parameters with approximately
|
|
the correct order of magnitude.
|
|
|
|
%\subsection{Existence concerns}
|
|
%check matrix inverses etc.
|
|
%
|
|
%I am currently working on a plan to guarantee existence of solutions.
|
|
%Some of what I want to do is check numerically crucial values and
|
|
%mathematically necessary conditions for existence and uniqueness.
|
|
%Unfortunately this is little more than just a plan right now.
|
|
|
|
\end{document}
|