Bozkus et al., 2024 - Google Patents
Multi-Timescale Ensemble $ Q $-Learning for Markov Decision Process Policy OptimizationBozkus et al., 2024
View PDF- Document ID
- 11791039485128137678
- Author
- Bozkus T
- Mitra U
- Publication year
- Publication venue
- IEEE Transactions on Signal Processing
External Links
Snippet
Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble …
- 238000000034 method 0 title abstract description 21
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Bhandari et al. | Global optimality guarantees for policy gradient methods | |
| Guo et al. | Entropy regularization for mean field games with learning | |
| Kaufmann et al. | Adaptive reward-free exploration | |
| Letarte et al. | Dichotomize and generalize: PAC-Bayesian binary activated deep neural networks | |
| Meir | Nonparametric time series prediction through adaptive model selection | |
| Bozkus et al. | Multi-Timescale Ensemble $ Q $-Learning for Markov Decision Process Policy Optimization | |
| Jagalur-Mohan et al. | Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design | |
| Still et al. | Optimal causal inference: Estimating stored information and approximating causal architecture | |
| Chang | Bayesian neural networks: Essentials | |
| US12282578B2 (en) | Privacy filters and odometers for deep learning | |
| Cohn et al. | Mean field variational approximation for continuous-time Bayesian networks | |
| Zhou et al. | Automatic integration for spatiotemporal neural point processes | |
| Doan | Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $ O (1/k) $ Finite-Sample Complexity | |
| Gupta et al. | Probabilistic contraction analysis of iterated random operators | |
| Meshram et al. | Simulation based algorithms for Markov decision processes and multi-action restless bandits | |
| Bozkus et al. | Leveraging digital cousins for ensemble q-learning in large-scale wireless networks | |
| Shah et al. | Adaptive consensus: a network pruning approach for decentralized optimization | |
| Helmut et al. | Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration. | |
| Jurgens et al. | Ambiguity rate of hidden Markov processes | |
| Bozkus et al. | A novel ensemble q-learning algorithm for policy optimization in large-scale networks | |
| Hawkins et al. | Forward-backward rapidly-exploring random trees for stochastic optimal control | |
| CN114092269A (en) | Time sequence data prediction method and device based on improved generalized network vector model | |
| Barendregt et al. | Adaptive Bayesian inference of Markov transition rates | |
| Nockolds et al. | Lilan: A linear latent network approach for real-time solutions of stiff, nonlinear, ordinary differential equations | |
| Zhao et al. | A bound on modeling error in observable operator models and an associated learning algorithm |