\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}

\begin{document}
The computational approach I take is based on
\cite{Maliar2019}'s Bellman Residual Minimization, with the 
policy and value functions are approximated using a neural network.
In summary the bellman equation is rewritten in the form:
\begin{align}
    Q = V(S_T,D_t) - F(S_t,D_t,X_t(S_t,D_t)) -\beta V(S_{t+1},D_{t+1})
\end{align}
With a policy maximization condition such as:
\begin{align}
    M =  \left[ F(S_t,D_t,X_t(S_t,D_t)) + \beta V(S_{t+1},D_{t+1})\right]
\end{align}

In the deterministic case, a loss function can be constructed in 
either of the following equivalent cases:
\begin{align}
    \phi_1 = Q^2 - vM \\
    \phi_2 = \left (M - Q - \frac{v}{2}\right)^2  - v \cdot \left(Q + \frac{v}{4}\right)
\end{align}
where $v$ is an external weighting parameter which can be cross validated.

By choosing a neural network as the functional approximation, we are able to 
use the fact that a NN with a single hidden layer can be used to approximate
functions arbitrarily well 
under certain conditions \autocite{White1990}.
We can also
take advantage of the significant computational and practical improvements
currently revolutionizing Machine Learning.
Some examples include the use of specialized hardware and the ability to transfer 
learning between models, both of which can speed up functional approximation.

\subsection{Computational Plan}
The neural network library I've chosen to use is Flux.jl \cite{Innes2018}
a Neural Network library implmented in and for the Julia language,
although the Bellman Residual Minimization algorithm would work equally well in 
PyTorch or TensorFlow
\footnote{
    The initial reason I investigated Flux/Julia is due to the source to source 
    Automatic Differentiation capabilities, which I intended to use to implement
    a generic version of \cite{Maliar2019}'s euler equation iteration method.
    While I still believe this is possible and that Flux represents one of the 
    best tools available for that specific purpose,
    I've been unsuccessful at implementing the algorithm.
}.
Below I note some of the design, training, and implementation decisions.

%Data Description
The data used to train the network is simulated data, pulled from random distributions.
One advantage of this approach is that by changing the distribution, the emphasis
in the training changes. 
Initially training can be focused on certain areas of the state space, but later
training can put the focus on other areas as their importance is recognized.
In the case that we don't know which data areas to investigate, it is possible to
optimize over a given dataset, and the iterate stocks and debris forward
many periods. 
If the debris and stocks don't line up well with the initial training dataset, 
we can change the distribution to cover the stocks and debris from the iteration,
thus bootstrapping the distribution of the training set.

\subsubsection{Constellation Operators}
%Operators
%   Branched Policy Topology
%   Individual Value functions
%   Training Loop
Although there are multiple operators, the individual policy functions
show up jointly as the code is currently implemented.
For this reason, I've implemented each operator's policy function
as a ``branch'' within a single neural network.
These branches are configured such that they each recieve the same 
inputs (stocks and debris), but decisions in each branch are made without reference
These results are then concatenated together into the final policy vector.
When training a given operator, the appropriate branch is unfrozen so that operator can train.
Value functions are implemented as unique neural networks at the constellation operator level,
much like the operator's bellman residual function.

The training loops take the form of:

For each epoch
\begin{enumerate}
    \item generate data
    \item for each operator
        \begin{enumerate}
            \item Unfreeze branch
            \item Train policy function on data
            \item Freeze branch
            \item Train Value function on data
        \end{enumerate}
    \item Check termination conditions
\end{enumerate}

Overall, this allows for each operator's policy and value functions to be approximated
on it's own bellman residuals, while maintaining a convenient interface.

\subsubsection{Planner}
%Planner
%   policy topology
%   Value function topology
%   Training loop

The policy function for the Fleet Planner does not require any separate branches,
although it could if desired for comparison purposes.
The key point though, is that no parameter freezing is done during training, 
allowing the repercussions on other constellations to be taken into account.
Similarly there is a single neural network used to estimate the value function.

The training loops take the form of:

For each epoch
\begin{enumerate}
    \item generate data
        \begin{enumerate}
            \item Train policy function on data
            \item Train Value function on data
        \end{enumerate}
    \item Check termination conditions
\end{enumerate}

\subsubsection{Heterogeneous Agents and Nash Equilibria}
One key question is how to handle the case of heterogeneous agents.
In the processes outlined above, the heterogeneous agents are simply 
identified by their position in the state and action vectors and
then the NN learns how to operate with each of them\footnote{
    I believe it may be possible to create some classifications of 
    different heterogeneous agent types that allows for simpler function transfers,
    but the implementation will take some extensive code design work.
}.


When the laws of motion depend on other agents' decisions, the opportunity
for Nash and other game theoretic equilibria to arise.
One benefit of using neural networks is that they can find standard equilibrium concepts,
including mixed nash equilibria if configured properly.
%concerns about nash computability

\subsection{Functional Forms}
The reference functional forms for the model are similar to those 
given in \cite{RaoRondina2020}.
\begin{itemize}
    \item The linear per-period benefit function:
        \begin{align}
            u^i(S_t, D_t, X_t) = \pi s^i_t -  f \cdot x^i_t 
        \end{align}
    \item Each constellation's satellite survival function:
        \begin{align}
            R^i(S_t, D_t) = e^{- d\cdot D_t - \sum^N_{j=1} h^j s^j_t}
        \end{align}
\end{itemize}

\subsubsection{Parameter Values}
%I'm just guessing.
Currently, I've not found a way to estimate the proper parameters to use,
and there needs to be a discussion of how to calibrate those parameters.
So far, my goal is to choose parameters with approximately 
the correct order of magnitude.

%\subsection{Existence concerns}
%check matrix inverses etc.
%
%I am currently working on a plan to guarantee existence of solutions.
%Some of what I want to do is check numerically crucial values and 
%mathematically necessary conditions for existence and uniqueness.
%Unfortunately this is little more than just a plan right now.

\end{document}