로봇 강화 학습
robot reinforcement learning
PPO
강화 학습이란 모든 트로젝토리에서 리워드의 합이 최대가 되게 하는 방식을 찾아 내는것
policy gradient
샘플이 많아야 함.
많은 데이터가 필요한 방식을 해결하기 위해 actor critic 방식으로 접근
state, action, reward
PPO
SIM to REAL
강화 학습을 실제 로봇으로 하게 되면 비용, 위헙도 측면에서 적합하지 않다. 그래서 SIM을 사용하는게 더 유용합니다.
다만 SIM과 실제 환경의 차이가 있어서 이를 위해 보상해줘야 하는 이슈가 있다.
FurnitureBench
https://github.com/clvrai/furniture-bench?tab=readme-ov-file
https://clvrai.github.io/furniture-bench/
robust rocomotion
rl in the world :daydreamer
a walk in the park learning to walk in 20 minutes with model-free reinforcement learning
.
강화 학습을 이용한 로봇 보행 (PPO를 사용)
논문- (Learning to walk in minutes using Massively Parallel Deep RL) - 엄청 많은 디바이스로 학습
reward
.
논문 (RMA) - 어려운 환경에서 실시간으로 동작
.
논문 - robot parkour
.
논문 (휴머노이드) robot parkour
https://humanoid4parkour.github.io/.
.
논문 (transformer)
.
.
tesla optimus.
https://x.com/tesla_optimus/status/1922456791549427867.
unitree.
.
.
boston Dynamics
.
.
로봇 파운데이션 모델
RFM (robot foundation model)
2016 google foundation model
QT-Opt
MT-Opt
task 나눠서, 잡을 물체를 정해서 잡는 훈련
BC-Z
RT-1 (robotics transformer)
https://robotics-transformer1.github.io/
논문
RT-2
pretrain-vlm 을 가져다가 학습 (vision language action)
https://robotics-transformer2.github.io/
논문
ALOHA and ACT
ACT : imitation learning algorithm
https://tonyzhaozh.github.io/aloha/
mobile aloha
https://mobile-aloha.github.io/
diffusion policy (2023)
https://diffusion-policy.cs.columbia.edu/
https://github.com/real-stanford/diffusion_policy
scaling robotic datasets
논문
https://droid-dataset.github.io/
https://github.com/google-deepmind/open_x_embodiment
https://robotics-transformer-x.github.io/
RT-X model
RT-1, RT-2, OpenX
Octo
transformer + diffusion
https://octo-models.github.io/
OpenVLA [An Open-Source Vision-Language-Action Model] (open source, llama 사용)
https://openvla.github.io/
https://github.com/openvla/openvla
OpenVLA-OFT
https://github.com/moojink/openvla-oft
https://openvla-oft.github.io/
SOTA VLA - open model
https://www.physicalintelligence.company/
10000 시간의 데이터
pyzero
https://www.physicalintelligence.company/blog/pi0
py fast
https://www.physicalintelligence.company/research/fast
py zero
https://www.physicalintelligence.company/blog/pi05
Real-Time Action Chunking with Large Models
https://www.physicalintelligence.company/research/real_time_chunking