- To compare the performance, the authors assume an idealized policy which can execute all V shares immediately at the mid-spread. Why the authors 'always' expect to do worse than this idealized policy? If the price moves to the advantage of a trader within the time period H, the idealized policy can't do better.
- Following question 1, the authors define as the measurement, "the trading cost of a policy as the underperformance compared to the mid-spread baseline". Does the authors mean the outperformance of their policies is excluded from the trading costs?
- In the paper, it is stated, "our reward function captures the most important aspects of execution - bid-ask spread, market impact, opportunity cost, etc." I don't see how it's been done in the paper.
Saturday, June 2, 2007
A Presentation for AI course
I am taking Artificial Intelligence at NTUST. For the final presentation in this course, I chose the paper published at ICML 20006, "Reinforcement Learning for Optimized Trade Execution". The problem studied in this paper can be defined as follows, "to sell ( or buy) V shares of a given stock within a fixed time period H, in a manner that maximizes the revenue received ( respectively, minimizes the capital spent)." An RL approach by modifying Q-Learning algorithm is applied to try solving the problem. There are several questions I have while reading the paper. Hope someday I can find out the answers.
Subscribe to:
Posts (Atom)