1 minute read

robot reinforcement learning

PPO

강화 학습이란 모든 트로젝토리에서 리워드의 합이 최대가 되게 하는 방식을 찾아 내는것

policy gradient

image
샘플이 많아야 함.
image
많은 데이터가 필요한 방식을 해결하기 위해 actor critic 방식으로 접근
state, action, reward
image

PPO

image

SIM to REAL

강화 학습을 실제 로봇으로 하게 되면 비용, 위헙도 측면에서 적합하지 않다. 그래서 SIM을 사용하는게 더 유용합니다.
image

다만 SIM과 실제 환경의 차이가 있어서 이를 위해 보상해줘야 하는 이슈가 있다.
image

FurnitureBench

https://github.com/clvrai/furniture-bench?tab=readme-ov-file
https://clvrai.github.io/furniture-bench/
image

robust rocomotion

rl in the world :daydreamer

a walk in the park learning to walk in 20 minutes with model-free reinforcement learning

.

강화 학습을 이용한 로봇 보행 (PPO를 사용)

논문- (Learning to walk in minutes using Massively Parallel Deep RL) - 엄청 많은 디바이스로 학습

image
image
image
reward
image image image
image

.

논문 (RMA) - 어려운 환경에서 실시간으로 동작

image
image
image
image

.

논문 - robot parkour

robot-parkour.github.io.
image
image
image
image
image
image
image
image
image

.

논문 (휴머노이드) robot parkour

https://humanoid4parkour.github.io/.
image image image image

.

논문 (transformer)

image image image image.

.

tesla optimus.

image
https://x.com/tesla_optimus/status/1922456791549427867.

unitree.

image

.

image

.

boston Dynamics

image

.

image

.

로봇 파운데이션 모델

RFM (robot foundation model)
image
image

2016 google foundation model

image

QT-Opt

image

논문

MT-Opt

task 나눠서, 잡을 물체를 정해서 잡는 훈련
image

논문

BC-Z

image

논문

RT-1 (robotics transformer)

image
image

https://robotics-transformer1.github.io/
논문

RT-2

pretrain-vlm 을 가져다가 학습 (vision language action)
image

https://robotics-transformer2.github.io/
논문

ALOHA and ACT

image
image
image
ACT : imitation learning algorithm
image image image image

https://tonyzhaozh.github.io/aloha/

mobile aloha

image
image

https://mobile-aloha.github.io/

diffusion policy (2023)

image
image
image
image

https://diffusion-policy.cs.columbia.edu/
https://github.com/real-stanford/diffusion_policy

scaling robotic datasets

image
논문
https://droid-dataset.github.io/

image
image
image
https://github.com/google-deepmind/open_x_embodiment
https://robotics-transformer-x.github.io/

RT-X model

RT-1, RT-2, OpenX image
image

Octo

transformer + diffusion
image
image

https://octo-models.github.io/

OpenVLA [An Open-Source Vision-Language-Action Model] (open source, llama 사용)

image
image
image
image
image

https://openvla.github.io/
https://github.com/openvla/openvla

OpenVLA-OFT

image
image

https://github.com/moojink/openvla-oft
https://openvla-oft.github.io/

SOTA VLA - open model

https://www.physicalintelligence.company/
10000 시간의 데이터

pyzero

image
image
image
image
image

https://www.physicalintelligence.company/blog/pi0

py fast

image
https://www.physicalintelligence.company/research/fast

py zero

image
image
image
https://www.physicalintelligence.company/blog/pi05

Real-Time Action Chunking with Large Models

https://www.physicalintelligence.company/research/real_time_chunking

image