Citation and metadata
Recommended citation
Wesselhöft M, Braun P, Kreutzfeldt J (2023). Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse. Logistics Journal : Proceedings, Vol. 2023. (urn:nbn:de:0009-14-58257)
Download Citation
Endnote
%0 Journal Article %T Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse %A Wesselhöft, Mike %A Braun, Philipp %A Kreutzfeldt, Jochen %J Logistics Journal : Proceedings %D 2023 %V 2023 %N 1 %@ 2192-9084 %F wesselhöft2023 %X With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning. %L 620 %K Autonome Roboter %K Künstliche Intelligenz %K Logistik 4.0 %K Reinforcement Learning %K Robotik %K artificial intelligence %K autonomous mobile robots %K logistics 4.0 %K robotics %R 10.2195/lj_proc_wesselhoeft_en_202310_01 %U http://nbn-resolving.de/urn:nbn:de:0009-14-58257 %U http://dx.doi.org/10.2195/lj_proc_wesselhoeft_en_202310_01Download
Bibtex
@Article{wesselhöft2023, author = "Wesselh{\"o}ft, Mike and Braun, Philipp and Kreutzfeldt, Jochen", title = "Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse", journal = "Logistics Journal : Proceedings", year = "2023", volume = "2023", number = "1", keywords = "Autonome Roboter; K{\"u}nstliche Intelligenz; Logistik 4.0; Reinforcement Learning; Robotik; artificial intelligence; autonomous mobile robots; logistics 4.0; robotics", abstract = "With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning.", issn = "2192-9084", doi = "10.2195/lj_proc_wesselhoeft_en_202310_01", url = "http://nbn-resolving.de/urn:nbn:de:0009-14-58257" }Download
RIS
TY - JOUR AU - Wesselhöft, Mike AU - Braun, Philipp AU - Kreutzfeldt, Jochen PY - 2023 DA - 2023// TI - Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse JO - Logistics Journal : Proceedings VL - 2023 IS - 1 KW - Autonome Roboter KW - Künstliche Intelligenz KW - Logistik 4.0 KW - Reinforcement Learning KW - Robotik KW - artificial intelligence KW - autonomous mobile robots KW - logistics 4.0 KW - robotics AB - With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning. SN - 2192-9084 UR - http://nbn-resolving.de/urn:nbn:de:0009-14-58257 DO - 10.2195/lj_proc_wesselhoeft_en_202310_01 ID - wesselhöft2023 ER -Download
Wordbib
<?xml version="1.0" encoding="UTF-8"?> <b:Sources SelectedStyle="" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" > <b:Source> <b:Tag>wesselhöft2023</b:Tag> <b:SourceType>ArticleInAPeriodical</b:SourceType> <b:Year>2023</b:Year> <b:PeriodicalTitle>Logistics Journal : Proceedings</b:PeriodicalTitle> <b:Volume>2023</b:Volume> <b:Issue>1</b:Issue> <b:Url>http://nbn-resolving.de/urn:nbn:de:0009-14-58257</b:Url> <b:Url>http://dx.doi.org/10.2195/lj_proc_wesselhoeft_en_202310_01</b:Url> <b:Author> <b:Author><b:NameList> <b:Person><b:Last>Wesselhöft</b:Last><b:First>Mike</b:First></b:Person> <b:Person><b:Last>Braun</b:Last><b:First>Philipp</b:First></b:Person> <b:Person><b:Last>Kreutzfeldt</b:Last><b:First>Jochen</b:First></b:Person> </b:NameList></b:Author> </b:Author> <b:Title>Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse</b:Title> <b:Comments>With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning.</b:Comments> </b:Source> </b:Sources>Download
ISI
PT Journal AU Wesselhöft, M Braun, P Kreutzfeldt, J TI Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse SO Logistics Journal : Proceedings PY 2023 VL 2023 IS 1 DI 10.2195/lj_proc_wesselhoeft_en_202310_01 DE Autonome Roboter; Künstliche Intelligenz; Logistik 4.0; Reinforcement Learning; Robotik; artificial intelligence; autonomous mobile robots; logistics 4.0; robotics AB With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning. ERDownload
Mods
<mods> <titleInfo> <title>Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse</title> </titleInfo> <name type="personal"> <namePart type="family">Wesselhöft</namePart> <namePart type="given">Mike</namePart> </name> <name type="personal"> <namePart type="family">Braun</namePart> <namePart type="given">Philipp</namePart> </name> <name type="personal"> <namePart type="family">Kreutzfeldt</namePart> <namePart type="given">Jochen</namePart> </name> <abstract>With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning.</abstract> <subject> <topic>Autonome Roboter</topic> <topic>Künstliche Intelligenz</topic> <topic>Logistik 4.0</topic> <topic>Reinforcement Learning</topic> <topic>Robotik</topic> <topic>artificial intelligence</topic> <topic>autonomous mobile robots</topic> <topic>logistics 4.0</topic> <topic>reinforcement learning</topic> <topic>robotics</topic> </subject> <classification authority="ddc">620</classification> <relatedItem type="host"> <genre authority="marcgt">periodical</genre> <genre>academic journal</genre> <titleInfo> <title>Logistics Journal : Proceedings</title> </titleInfo> <part> <detail type="volume"> <number>2023</number> </detail> <detail type="issue"> <number>1</number> </detail> <date>2023</date> </part> </relatedItem> <identifier type="issn">2192-9084</identifier> <identifier type="urn">urn:nbn:de:0009-14-58257</identifier> <identifier type="doi">10.2195/lj_proc_wesselhoeft_en_202310_01</identifier> <identifier type="uri">http://nbn-resolving.de/urn:nbn:de:0009-14-58257</identifier> <identifier type="citekey">wesselhöft2023</identifier> </mods>Download
Full Metadata
Bibliographic Citation | Logistics Journal : referierte Veröffentlichungen, Vol. 2023, Iss. 1 |
---|---|
Title |
Comparing Continuous Single-Agent Reinforcement Learning Controls in a Simulated Logistic Environment using NVIDIA Omniverse (eng) Vergleich von kontinuierlichen Single-Agent Reinforcement Learning-Steuerungen in einer simulierten Logistikumgebung mit NVIDIA Omniverse (ger) |
Author | Mike Wesselhöft, Philipp Braun, Jochen Kreutzfeldt |
Language | eng |
Abstract | With the transition to Logistics 4.0, the increasing demand for autonomous mobile robots (AMR) in logistics has amplified the complexity of fleet control in dynamic environments. Reinforcement learning (RL), particularly decentralized RL algorithms, has emerged as a potential solution given its ability to learn in uncertain terrains. While discrete RL structures have shown merit, their adaptability in logistics remains questionable due to their inherent limitations. This paper presents a comparative analysis of continuous RL algorithms - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) - in the context of controlling a Turtlebot3 within a warehouse scenario. Our findings reveal A2C as the frontrunner in terms of success rate and training time, while DDPG excels step minimization while PPO distinguishes itself primarily through its relatively short training duration. This study underscores the potential of continuous RL algorithms, especially A2C, in the future of AMR fleet management in logistics. Significant work remains to be done, particularly in the area of algorithmic fine-tuning. Mit dem Übergang zur Logistik 4.0 hat der zunehmende Bedarf an autonomen mobilen Robotern (AMR) in der Logistik die Komplexität der Flottensteuerung in dynamischen Umgebungen erhöht. Reinforcement Learning (RL), insbesondere dezentrale RL-Algorithmen, haben sich aufgrund ihrer Fähigkeit, in unsicheren Umgebungen zu lernen, als potenzielle Lösung erwiesen. Während sich diskrete RL-Strukturen bewährt haben, bleibt ihre Anpassungsfähigkeit in der Logistik aufgrund ihrer inhärenten Einschränkungen fraglich. In diesem Beitrag wird eine vergleichende Analyse kontinuierlicher RL-Algorithmen - Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG) und Proximal Policy Optimization (PPO) - im Kontext der Steuerung eines Turtlebot3 in einem Lagerszenario vorgestellt. Unsere Ergebnisse zeigen A2C als Spitzenreiter in Bezug auf Erfolgsrate und Trainingszeit, während DDPG bei der Minimierung der Episodenlänge punktet und PPO lediglich mit einer geringen Trainingsdauer aufwarten kann. Diese Studie unterstreicht das Potenzial von kontinuierlichen RL-Algorithmen, insbesondere A2C, für die Zukunft des AMR-Flottenmanagements in der Logistik, wobei gerade im Bereich des Finetunings der Algorithmen noch viel Arbeit zu tun ist. |
Subject | Autonome Roboter, Künstliche Intelligenz, Logistik 4.0, Reinforcement Learning, Robotik, artificial intelligence, autonomous mobile robots, logistics 4.0, reinforcement learning, robotics |
DDC | 620 |
Rights | cc-by |
URN: | urn:nbn:de:0009-14-58257 |
DOI | https://doi.org/10.2195/lj_proc_wesselhoeft_en_202310_01 |