Publications of Ronald J. Williams Available for Downloading
To see abstracts click here.
These papers are also available via direct
ftp.
Journal Articles
Peng, J. and Williams, R. J. (1996).
Incremental multi-step Q-learning.
Machine Learning, 22, 283-290.
Peng, J. and Williams, R. J. (1993).
Efficient learning and planning within the Dyna framework.
Adaptive Behavior, 2, 437-454.
Williams, R. J. (1992).
Simple statistical gradient-following algorithms
for connectionist reinforcement learning.
Machine Learning, 8, 229-256.
Williams, R. J. and Peng, J. (1991).
Function optimization using connectionist
reinforcement learning algorithms.
Connection Science, 3, 241-268. [Figures not included]
Williams, R. J. and Peng, J. (1990).
An efficient gradient-based algorithm for on-line
training of recurrent network trajectories.
Neural Computation, 2, 490-501.
Williams, R. J. and Zipser, D. (1989).
A learning algorithm for continually running
fully recurrent neural networks.
Neural Computation, 1, 270-280.
Conference and Workshop Presentations
Al-Ansari, M. A. and Williams, R. J. (1999).
Efficient, globally-optimized reinforcement learning with the
parti-game algorithm.
Advances in Neural Information Processing Systems 11
Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
Proceedings of the Tenth Yale Workshop on Adaptive and Learning
Systems, June, New Haven, CT, 204-209.
Peng, J. and Williams, R. J. (1994).
Incremental multi-step Q-learning.
Proceedings of the Eleventh International Conference on
Machine Learning, July, New Brunswick, NJ, 226-232.
Williams, R. J. and Baird, L. C., III (1994).
Tight performance bounds on greedy policies based on imperfect
value functions.
Proceedings of the Eighth Yale Workshop on Adaptive
and Learning Systems, June, New Haven, CT, 108-113.
Williams, R. J. (1992).
Training recurrent networks using the extended Kalman filter.
Proceedings of the International Joint
Conference on Neural Networks, June, Baltimore, MD, Vol. IV, 241-246.
Williams, R. J. and Baird, L. C., III (1990).
A mathematical analysis of actor-critic architectures for learning
optimal controls through incremental dynamic programming.
Proceedings of the Sixth Yale Workshop on Adaptive
and Learning Systems, August, New Haven, CT, 96-101.
Williams, R. J. and Peng, J. (1989).
Reinforcement learning algorithms as function optimizers.
Proceedings of the International Joint
Conference on Neural Networks, Washington, DC, Vol. II, 89-95.
Book Chapters
Williams, R. J. and Zipser, D. (1995).
Gradient-based learning algorithms for recurrent networks
and their computational complexity.
In: Y. Chauvin and D. E. Rumelhart (Eds.)
Back-propagation: Theory, Architectures and Applications,
Hillsdale, NJ: Erlbaum.
[Figures not included]
Miller, S. and Williams, R. J. (1995).
Temporal difference learning: A chemical process control application.
In: A. F. Murray (Ed.)
Applications of Artificial Neural Networks,
Norwell, MA: Kluwer.
Williams, R. J. (1990).
Adaptive state representation and estimation using recurrent
connectionist networks.
In: W. T. Miller, R. S. Sutton, and P. J. Werbos (Eds.)
Neural Networks for Control,
Cambridge: MIT Press/Bradford Books.
Technical Reports
Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
Technical Report NU-CCS-98-13.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Baird, L. C., III (1993).
Tight performance bounds on greedy policies based on imperfect
value functions.
Technical Report NU-CCS-93-14.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Baird, L. C., III (1993).
Analysis of some incremental variants of policy iteration:
First steps toward understanding actor-critic learning systems.
Technical Report NU-CCS-93-11.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. (1992).
Some observations on the use of the extended Kalman filter as a
recurrent network learning algorithm.
Technical Report NU-CCS-92-1.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Zipser, D. (1990).
Gradient-based learning algorithms for recurrent connectionist
networks.
(Technical Report NU-CCS-90-9).
Boston: Northeastern University, College of Computer Science.
[Figures not available]
Greene, R. L. and Williams, R. J. (1989).
An approach to using rule-like training data
in connectionist networks.
Technical Report NU-CCS-89-30.
Boston: Northeastern University, College of Computer Science.
[Figures not available]
Abstracts of Papers Available For Downloading
To see just the titles without abstracts click
here.
These papers are also available by direct
ftp.
Journal Articles
Peng, J. and Williams, R. J. (1996).
Incremental multi-step Q-learning.
Machine Learning, 22, 283-290.
Peng, J. and Williams, R. J. (1993).
Efficient learning and planning within the Dyna framework.
Adaptive Behavior, 2, 437-454.
Williams, R. J. (1992).
Simple statistical gradient-following algorithms
for connectionist reinforcement learning.
Machine Learning, 8, 229-256.
Williams, R. J. and Peng, J. (1991).
Function optimization using connectionist
reinforcement learning algorithms.
Connection Science, 3, 241-268. [Figures not included]
Williams, R. J. and Peng, J. (1990).
An efficient gradient-based algorithm for on-line
training of recurrent network trajectories.
Neural Computation, 2, 490-501.
Williams, R. J. and Zipser, D. (1989).
A learning algorithm for continually running
fully recurrent neural networks.
Neural Computation, 1, 270-280.
Conference and Workshop Presentations
Al-Ansari, M. A. and Williams, R. J. (1999).
Efficient, globally-optimized reinforcement learning with the
parti-game algorithm.
Advances in Neural Information Processing Systems 11
Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
Proceedings of the Tenth Yale Workshop on Adaptive and Learning
Systems, June, New Haven, CT, 204-209.
Peng, J. and Williams, R. J. (1994).
Incremental multi-step Q-learning.
Proceedings of the Eleventh International Conference on
Machine Learning, July, New Brunswick, NJ, 226-232.
Williams, R. J. and Baird, L. C., III (1994).
Tight performance bounds on greedy policies based on imperfect
value functions.
Proceedings of the Eighth Yale Workshop on Adaptive
and Learning Systems, June, New Haven, CT, 108-113.
Williams, R. J. (1992).
Training recurrent networks using the extended Kalman filter.
Proceedings of the International Joint
Conference on Neural Networks, June, Baltimore, MD, Vol. IV, 241-246.
Williams, R. J. and Baird, L. C., III (1990).
A mathematical analysis of actor-critic architectures for learning
optimal controls through incremental dynamic programming.
Proceedings of the Sixth Yale Workshop on Adaptive
and Learning Systems, August, New Haven, CT, 96-101.
Williams, R. J. and Peng, J. (1989).
Reinforcement learning algorithms as function optimizers.
Proceedings of the International Joint
Conference on Neural Networks, Washington, DC, Vol. II, 89-95.
Book Chapters
Williams, R. J. and Zipser, D. (1995).
Gradient-based learning algorithms for recurrent networks
and their computational complexity.
In: Y. Chauvin and D. E. Rumelhart (Eds.)
Back-propagation: Theory, Architectures and Applications,
Hillsdale, NJ: Erlbaum.
Also appeared as:
Gradient-based learning algorithms for recurrent connectionist
networks, (Technical Report NU-CCS-90-9).
Boston: Northeastern University, College of Computer Science.
[Figures not included]
Miller, S. and Williams, R. J. (1995).
Temporal difference learning: A chemical process control application.
In: A. F. Murray (Ed.)
Applications of Artificial Neural Networks,
Norwell, MA: Kluwer.
Williams, R. J. (1990).
Adaptive state representation and estimation using recurrent
connectionist networks.
In: W. T. Miller, R. S. Sutton, and P. J. Werbos (Eds.)
Neural Networks for Control,
Cambridge: MIT Press/Bradford Books.
Technical Reports
Al-Ansari, M. A. and Williams, R. J. (1998).
Modifying the parti-game algorithm for increased robustness, higher
efficiency, and better policies.
Technical Report NU-CCS-98-13.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Baird, L. C., III (1993).
Tight performance bounds on greedy policies based on imperfect
value functions.
Technical Report NU-CCS-93-14.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Baird, L. C., III (1993).
Analysis of some incremental variants of policy iteration:
First steps toward understanding actor-critic learning systems.
Technical Report NU-CCS-93-11.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. (1992).
Some observations on the use of the extended Kalman filter as a
recurrent network learning algorithm.
Technical Report NU-CCS-92-1.
Boston: Northeastern University, College of Computer Science.
Williams, R. J. and Zipser, D. (1990).
Gradient-based learning algorithms for recurrent connectionist
networks.
(Technical Report NU-CCS-90-9).
Boston: Northeastern University, College of Computer Science.
[Figures not available]
Greene, R. L. and Williams, R. J. (1989).
An approach to using rule-like training data
in connectionist networks.
Technical Report NU-CCS-89-30.
Boston: Northeastern University, College of Computer Science.
[Figures not available]