728x90

논문 : Disentangling Monocular 3D Object Detection (MonoDis)

Disentangling Monocular 3D Object Detection

In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. Our pro

arxiv.org

저자 : Andrea Simonelli, Samuel Rota Bulo, Lorenzo Porzi, Manuel Lopez-Antequera, Peter Kontschieder

Publish : 2019,ICCV

[Intro Summary]

해결하고자하는 문제
- 2D Detection에 비해 3D detection의 성능이 매우 낮음
- ill-posed problem
지금까지 시도해왔던 방법들
- 3D bb, corresponding 2D projections , 3D box center depth 와 같은걸 encoding 하는 방식으로 동작
기존 방법들의 한계
- 파라미터들이 서로 다른 단위들을 가지고 있기 때문에 비교할 수가 없음.
- 이로 인해서 최적화에 부정적인 영향을 미침
- 이를 위해서, 2D 단과 3D단을 따로 학습하는데 이는 global optimal 보다 sub-optimal을 나타낼 수 있음 (loss 가)
이번 논문에서 시도할 방법
- disentangling transformation of loss
- sIOU
- self-supervised way
논문이 가지게 되는 Contributions
- disentangling transform을 이용해서 다른 그룹들로 분할 할 수 있었다.( 단위가 다른 parameters들을 분할해서 학습하고 1개만 학습할때 나머지는 GT를 이용해서 학습)
- sIOU 개념을 도입해서 성능을 올림
- 기존 3D metric에 대해서 비판적인 리뷰를 진행 후 새로운 방법을 도입.

[Summary] Main Points of this paper

Process of network

[Strengths] Clearly explain why these aspects of the paper are valuable.

disentangling transform을 고안해서 연관관계가 있는 서로 다른 단위의 loss들을 각각 학습할 수 있게 되었다. 이로인해서 성능이 훨씬 올라갔다. (수렴 속도와 안정성도 향상)
nuscense data를 사용함 → 새로운 데이터 + 평가지표 요약이 잘 되어있음.
sIOU 를 사용 → vanishing gradient를 예방해줌 (bb가 안 만나면 그만큼 음수로 진행) 학습에도 좋은 영향?

[Weaknesses] Clearly explain why these aspects of the paper are weak.

AAP_40을 사용해서 기존 KITTI 3D data에서 사용하는 AP_11이 내포하고 있는 문제를(단일로 평가하면 정확하지만 전체 데이터 셋에 대해서 정확한 예측을 평가하면 0이 아닌 0.0909로 생성된다) 해소하고자 하였지만, concept에서 다른 점이 무엇인지 모르겠다.(과연 이로 인해서 AP_11이 가지고 있던 문제를 해결할 수 있는 것인가?)
여전히 3D BB 라벨이 필요하다.

[Why accepted?] What is the contribution of the paper? Or novelty

disentangling transform → loss 학습에 효과적(파란색이 disentangling 적용)
sIOU는 꽤나 합리적인 방법이라고 생각됨.

[Appendix]

Focal loss
RetinaNet
Lifting transform

저작자표시 비영리 변경금지 (새창열림)

'공부 > Deep Learning' 카테고리의 다른 글

[Paper Review] MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization (0)	2021.05.01
[Paper Review] GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving (0)	2021.05.01
[Paper Review] Pyramid Stereo Matching Network(PSMNet) (0)	2021.05.01
[Paper Review] Visual SLAM algorithms: a survey from 2010 to 2016 (0)	2021.05.01
[Paper Review] Unsupervised Learning of Depth and Ego-Motion from Video (0)	2021.05.01

Jin's Life

[Paper Review] Disentangling Monocular 3D Object Detection (MonoDis)

[Intro Summary]

[Summary] Main Points of this paper

[Strengths] Clearly explain why these aspects of the paper are valuable.

[Weaknesses] Clearly explain why these aspects of the paper are weak.

[Why accepted?] What is the contribution of the paper? Or novelty

[Appendix]

'공부 > Deep Learning' 카테고리의 다른 글

티스토리툴바

[Paper Review] Disentangling Monocular 3D Object Detection (MonoDis)

[Intro Summary]

[Summary] Main Points of this paper

[Strengths] Clearly explain why these aspects of the paper are valuable.

[Weaknesses] Clearly explain why these aspects of the paper are weak.

[Why accepted?] What is the contribution of the paper? Or novelty

[Appendix]

'공부 > Deep Learning' 카테고리의 다른 글

관련글

티스토리툴바