Thoughts on modelling #11

Currently I am thinking of using an action space that permits x \in 0:10 and a cournot payout where demand is something like 100 - (q_1 + q_2).

Currently I am thinking of using an action space that permits $x \in 0:10$ and a cournot payout where demand is something like $100 - (q_1 + q_2)$.

I am also thinking of making each firm "identical", i.e. using the same policy for each, but they are not necessarily in symmetric conditions.

I need to write the story of the model:

Situation
Market & profit
Satellite Stocks
- Satellite Launches
- Satellite Decay

There are two competing satellite internet firms.

Two competing internet broadband firms. Each period they collect fees in a broadband
market operating according to Y. They see how many satellites were degraded and total debris and then decide how many satellites to launch. These will operate in the next period but launch in this period. Their costs of launch are X and they must pay Z when the satellites are deorbited at the end of 100 years.

I need to write the story of the model: - Situation - Market & profit - Satellite Stocks - Satellite Launches - Satellite Decay - There are two competing satellite internet firms. Two competing internet broadband firms. Each period they collect fees in a broadband market operating according to Y. They see how many satellites were degraded and total debris and then decide how many satellites to launch. These will operate in the next period but launch in this period. Their costs of launch are X and they must pay Z when the satellites are deorbited at the end of 100 years.

Maybe restrict actions to a finite discrete set of choices. This would permit Q-learning as an approach.

This is an example of a Cost Function Approximation because the policy would consist of maximizing Q over A.

Maybe restrict actions to a finite discrete set of choices. This would permit Q-learning as an approach. This is an example of a Cost Function Approximation because the policy would consist of maximizing Q over A.

While thinking about the problem, the Social Planner's approach might be to minimize debris generation or constrain it below some rate of growth.

So I have read through most of the MARL book and here are my thoughts.

I should use a NN/Link probabalistic approach (Deep reinforcement learning) for policies.
An actor critic method is similar to what Maliar and Maliar suggested for some of their approaches (bellman residual).
I could use an actor/criti method with a joint critic.
I should model them as homogeneous agents.
I still need to come up with a solution concept to the game theory part and how to enforce it.
- I am considering a correlated nash equilibrium ( by including some random processes in the mix).
  - I might also look at pareto optimal solution concepts.
I need to take a look at how to enforce budget constraints on the operators.

So I have read through most of the MARL book and here are my thoughts. - I should use a NN/Link probabalistic approach (Deep reinforcement learning) for policies. - An actor critic method is similar to what Maliar and Maliar suggested for some of their approaches (bellman residual). - I could use an actor/criti method with a joint critic. - I should model them as homogeneous agents. - I still need to come up with a solution concept to the game theory part and how to enforce it. - I am considering a correlated nash equilibrium ( by including some random processes in the mix). - I might also look at pareto optimal solution concepts. - I need to take a look at how to enforce budget constraints on the operators.

Labels Milestones

Thoughts on modelling #11