Saturday, June 2, 2007

A Presentation for AI course

I am taking Artificial Intelligence at NTUST. For the final presentation in this course, I chose the paper published at ICML 20006, "Reinforcement Learning for Optimized Trade Execution". The problem studied in this paper can be defined as follows, "to sell ( or buy) V shares of a given stock within a fixed time period H, in a manner that maximizes the revenue received ( respectively, minimizes the capital spent)." An RL approach by modifying Q-Learning algorithm is applied to try solving the problem. There are several questions I have while reading the paper. Hope someday I can find out the answers.
  1. To compare the performance, the authors assume an idealized policy which can execute all V shares immediately at the mid-spread. Why the authors 'always' expect to do worse than this idealized policy? If the price moves to the advantage of a trader within the time period H, the idealized policy can't do better.
  2. Following question 1, the authors define as the measurement, "the trading cost of a policy as the underperformance compared to the mid-spread baseline". Does the authors mean the outperformance of their policies is excluded from the trading costs?
  3. In the paper, it is stated, "our reward function captures the most important aspects of execution - bid-ask spread, market impact, opportunity cost, etc." I don't see how it's been done in the paper.