《阿尔法零对最优模型预测自适应控制的启示》 - [美]德梅萃·P. 博塞克斯[Dimitri P. Bert - 清华大学出版社 - 香港大書城 - Meg Book Store

	登入帳戶　 \|　訂單查詢　 \|　購物車/收銀台(0)　\|　在線留言板　 \|　付款方式　 \|　運費計算　 \|　聯絡我們　 \|　幫助中心　\|　加入書簽
		會員登入新用戶登記

HOME

新書上架

暢銷書架

好書推介

會員書架精選

2024年度TOP

臺灣用戶

品種：超過100萬種各類書籍/音像和精品，正品正價，放心網購，悭钱省心

服務：香港／台灣／澳門／海外

送貨：速遞／郵局／服務站

新書上架：簡體書繁體書
暢銷書架：簡體書繁體書
好書推介：簡體書繁體書

四月出版：大陸書台灣書
三月出版：大陸書台灣書
二月出版：大陸書台灣書
一月出版：大陸書台灣書
12月出版：大陸書台灣書
11月出版：大陸書台灣書
十月出版：大陸書台灣書
九月出版：大陸書台灣書
八月出版：大陸書台灣書
七月出版：大陸書台灣書
六月出版：大陸書台灣書
五月出版：大陸書台灣書
四月出版：大陸書台灣書
三月出版：大陸書台灣書
二月出版：大陸書台灣書

『簡體書』阿尔法零对最优模型预测自适应控制的启示

書城自編碼： 4099328
分類：簡體書→大陸圖書→計算機/網絡→操作系統/系統開發
作者： [美]德梅萃·P. 博塞克斯[Dimitri P. Bert
國際書號(ISBN)： 9787302684718
出版社：清华大学出版社
出版日期： 2025-04-01

頁數/字數： /
書度/開本： 16开釘裝：平装

售價：HK$ 86.9

我要買件

share:

** 我創建的書架 **
未登入.

新書推薦：

信念危机：投资者心理与金融脆弱性

《信念危机：投资者心理与金融脆弱性》
售價：HK$ 74.8

喵星语解密手册

《喵星语解密手册》
售價：HK$ 86.9

新型戏剧编剧技巧初探

《新型戏剧编剧技巧初探》
售價：HK$ 82.5

一日浮生

《一日浮生》
售價：HK$ 79.2

历代玉器收藏与鉴赏

《历代玉器收藏与鉴赏》
售價：HK$ 396.0

铁腕拗相王安石：北宋的改革家和变法者

《铁腕拗相王安石：北宋的改革家和变法者》
售價：HK$ 107.8

中国法律史学学说史（中国法学学术史丛书；国家出版基金项目）

《中国法律史学学说史（中国法学学术史丛书；国家出版基金项目）》
售價：HK$ 184.8

方尖碑（全2册）

《方尖碑（全2册）》
售價：HK$ 105.6

編輯推薦：

[美]德梅萃·P. 博塞克斯教授是国际运筹优化与控制领域的著名学者，我社影印、翻译了他的大量著作，受到国内读者的广泛好评。本书是他的最新著作，构建了近似动态规划和强化学习的的新的理论框架，简洁但雄心勃勃。当今新一代人工智能技术发展绚丽多彩，在看似纷繁复杂的数据与算法表象之下，其实蕴藏着简洁而美妙的规律。

內容簡介：

第一章，从阿尔法零的卓越性能出发，深入解读其背后着实不易的成长历程，揭示其数学模型。第二章，从确定性和随机动态规划问题入手，介绍决策问题的数学模型。第三章，从抽象视角回顾纷繁复杂的强化学习算法，揭示值函数近似与滚动改进的重要作用。第四章，从经典的线性二次型最优控制问题入手，分析从阿尔法零的成功中学到的经验。第五章，分别从鲁棒、自适应、模型预测控制等问题入手，分析值函数近似与滚动改进对算法性能的提升潜力。第六章，从离散优化的视角审视阿尔法零的成功经验。第七章，总结全书。适合作为本领域研究者作为学术专著阅读，也适合作为研究生和本科生作为参考书使用。

關於作者：

[美]德梅萃·P. 博塞克斯（Dimitri P. Bertseka），美国MIT终身教授，美国国家工程院院士，清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者，著有《非线性规划》《网络优化》《动态规划》《凸优化》《强化学习与最优控制》等十几本畅销教材和专著。

目錄：

1. AlphaZero， Off-Line Training， and On-Line Play
1.1. Off-Line Training and Policy Iteration P. 3
1.2. On-Line Play and Approximation in Value Space -
Truncated Rollout p. 6
1.3. The Lessons of AlphaZero p. 8
1.4. A New Conceptual Framework for Reinforcement Learning p. 11
1.5. Notes and Sources p. 14
2. Deterministic and Stochastic Dynamic Programming
2.1. Optimal Control Over an Infinite Horizon p. 20
2.2. Approximation in Value Space p. 25
2.3. Notes and Sources p. 30
3. An Abstract View of Reinforcement Learning
3.1. Bellman Operators p. 32
3.2. Approximation in Value Space and Newton‘s Method p. 39
3.3. Region of Stability p. 46
3.4. Policy Iteration， Rollout， and Newton’s Method p. 50
3.5. How Sensitive is On-Line Play to the Off-Line
Training Process p. 58
3.6. Why Not Just Train a Policy Network and Use it Without
On-Line Play p. 60
3.7. Multiagent Problems and Multiagent Rollout p. 61
3.8. On-Line Simplified Policy Iteration p. 66
3.9. Exceptional Cases p. 72
3.10. Notes and Sources p. 79
4. The Linear Quadratic Case - Illustrations
4.1. Optimal Solution p. 82
4.2. Cost Functions of Stable Linear Policies p. 83
4.3. Value Iteration p. 86
vii

viii Contents
4.4. One-Step and Multistep Lookahead - Newton Step
Interpretations p. 86
4.5. Sensitivity Issues p. 91
4.6. Rollout and Policy Iteration p. 94
4.7. Truncated Rollout - Length of Lookahead Issues . . p. 97
4.8. Exceptional Behavior in Linear Quadratic Problems . p. 99
4.9. Notes and Sources p. 100

5. Adaptive and Model Predictive Control
5.1. Systems with Unknown Parameters - Robust and
PID Control p. 102
5.2. Approximation in Value Space， Rollout， and Adaptive
Control p. 105
5.3. Approximation in Value Space， Rollout， and Model
Predictive Control p. 109
5.4. Terminal Cost Approximation - Stability Issues . . . p. 112
5.5. Notes and Sources p. 118
6. Finite Horizon Deterministic Problems - Discrete
Optimization
6.1. Deterministic Discrete Spaces Finite Horizon Problems. p. 120
6.2. General Discrete Optimization Problems p. 125
6.3. Approximation in Value Space p. 128
6.4. Rollout Algorithms for Discrete Optimization . . . p. 132
6.5. Rollout and Approximation in Value Space with Multistep
Lookahead p. 149
6.5.1. Simplified Multistep Rollout - Double Rollout . . p. 150
6.5.2. Incremental Rollout for Multistep Approximation in
Value Space p. 153
6.6. Constrained Forms of Rollout Algorithms p. 159
6.7. Adaptive Control by Rollout with a POMDP Formulation p. 173
6.8. Rollout for Minimax Control p. 182
6.9. Small Stage Costs and Long Horizon - Continuous-Time
Rollout p. 190
6.10. Epilogue p. 197
Appendix A: Newton‘s Method and Error Bounds
A.1. Newton’s Method for Differentiable Fixed
Point Problems p. 202
A.2. Newtons Method Without Differentiability of the
Hellman Operator p. 207

Contents ix

A.3. Local and Global Error Bounds for Approximation in
Value Space p. 210
A.4. Local and Global Error Bounds for Approximate
Policy Iteration p. 212
References p. 217

內容試閱：

With four parameters I can fit an elephant， and with five I can make him wiggle his trunk4 John von Neumann The purpose of this monograph is to propose and develop a new concep-tual framework for approximate Dynamic Programming (DP) and Rein-forcement Learning (RL). This framework centers around two algorithms， which are designed largely independently of each other and operate in syn-ergy through the powerful mechanism of Newtons method. We call these the off-line training and the on-line play algorithms; the names are bor-rowed from some of the major successes of RL involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess)， and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon). In these game contexts， the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position， while the on-line play algo-rithm is the method used to play in real time against human or computer opponents. f From the meeting of Freeman Dyson and Enrico Fermi (p. 273 of the Segre and Hoerlin biography of Fermi， The Pope of Physics， Picador， 2017): ”When Dyson met with him in 1953， Fermi welcomed him politely， but he quickly put aside the graphs he was being shown indicating agreement between theory and experiment. His verdict， as Dyson remembered， was ”There are two ways of doing calculations in theoretical physics. One way， and this is the way I prefer， is to have a clear physical picture of the process you are calculating. The other way is to have a precise and self-consistent mathematical formalism. You have neither.” When a stunned Dyson tried to counter by emphasizing the agreement between experiment and the calculations， Fermi asked him how many free parameters he had used to obtain the fit. Smiling after being told ”Four，” Fermi remarked， ”I remember my old friend Johnny von Neumann used to say， with four parameters I can fit an elephant， and with five I can make him wiggle his trunk.\ See also the paper by Mayer， Khairy， and Howard [MKH10]， which provides a verification of the von Neumann quotation.

書城介紹　 |　合作申請　|　索要書目　 |　新手入門　|　聯絡方式　 |　幫助中心　|　找書說明　 |　送貨方式　|　付款方式 香港用户　 |　台灣用户　|　海外用户

megBook.com.hk

Copyright © 2013 - 2025 （香港）大書城有限公司　 All Rights Reserved.