[go: up one dir, main page]

Bozkus et al., 2024 - Google Patents

Multi-Timescale Ensemble $ Q $-Learning for Markov Decision Process Policy Optimization

Bozkus et al., 2024

View PDF
Document ID
11791039485128137678
Author
Bozkus T
Mitra U
Publication year
Publication venue
IEEE Transactions on Signal Processing

External Links

Snippet

Reinforcement learning (RL) is a classical tool to solve network control or policy optimization problems in unknown environments. The original-learning suffers from performance and complexity challenges across very large networks. Herein, a novel model-free ensemble …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computer systems based on specific mathematical models
    • G06N7/005Probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • G06F17/30533Other types of queries

Similar Documents

Publication Publication Date Title
Bhandari et al. Global optimality guarantees for policy gradient methods
Guo et al. Entropy regularization for mean field games with learning
Kaufmann et al. Adaptive reward-free exploration
Letarte et al. Dichotomize and generalize: PAC-Bayesian binary activated deep neural networks
Meir Nonparametric time series prediction through adaptive model selection
Bozkus et al. Multi-Timescale Ensemble $ Q $-Learning for Markov Decision Process Policy Optimization
Jagalur-Mohan et al. Batch greedy maximization of non-submodular functions: Guarantees and applications to experimental design
Still et al. Optimal causal inference: Estimating stored information and approximating causal architecture
Chang Bayesian neural networks: Essentials
US12282578B2 (en) Privacy filters and odometers for deep learning
Cohn et al. Mean field variational approximation for continuous-time Bayesian networks
Zhou et al. Automatic integration for spatiotemporal neural point processes
Doan Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $ O (1/k) $ Finite-Sample Complexity
Gupta et al. Probabilistic contraction analysis of iterated random operators
Meshram et al. Simulation based algorithms for Markov decision processes and multi-action restless bandits
Bozkus et al. Leveraging digital cousins for ensemble q-learning in large-scale wireless networks
Shah et al. Adaptive consensus: a network pruning approach for decentralized optimization
Helmut et al. Cholesky-based experimental design for Gaussian process and kernel-based emulation and calibration.
Jurgens et al. Ambiguity rate of hidden Markov processes
Bozkus et al. A novel ensemble q-learning algorithm for policy optimization in large-scale networks
Hawkins et al. Forward-backward rapidly-exploring random trees for stochastic optimal control
CN114092269A (en) Time sequence data prediction method and device based on improved generalized network vector model
Barendregt et al. Adaptive Bayesian inference of Markov transition rates
Nockolds et al. Lilan: A linear latent network approach for real-time solutions of stiff, nonlinear, ordinary differential equations
Zhao et al. A bound on modeling error in observable operator models and an associated learning algorithm