Skip to content
Back to outputs

Un-VDNet: unsupervised network for visual odometry and depth estimation

Research output: Contribution to journalArticlepeer-review

Standard

Un-VDNet: unsupervised network for visual odometry and depth estimation. / Meng, Xuyang; Fan, Chunxiao; Ming, Yue; Shen, Yuan; Yu, Hui.

In: Journal of Electronic Imaging, Vol. 28, No. 6, 063015, 26.12.2019.

Research output: Contribution to journalArticlepeer-review

Harvard

Meng, X, Fan, C, Ming, Y, Shen, Y & Yu, H 2019, 'Un-VDNet: unsupervised network for visual odometry and depth estimation', Journal of Electronic Imaging, vol. 28, no. 6, 063015. https://doi.org/10.1117/1.JEI.28.6.063015

APA

Meng, X., Fan, C., Ming, Y., Shen, Y., & Yu, H. (2019). Un-VDNet: unsupervised network for visual odometry and depth estimation. Journal of Electronic Imaging, 28(6), [063015]. https://doi.org/10.1117/1.JEI.28.6.063015

Vancouver

Meng X, Fan C, Ming Y, Shen Y, Yu H. Un-VDNet: unsupervised network for visual odometry and depth estimation. Journal of Electronic Imaging. 2019 Dec 26;28(6). 063015. https://doi.org/10.1117/1.JEI.28.6.063015

Author

Meng, Xuyang ; Fan, Chunxiao ; Ming, Yue ; Shen, Yuan ; Yu, Hui. / Un-VDNet: unsupervised network for visual odometry and depth estimation. In: Journal of Electronic Imaging. 2019 ; Vol. 28, No. 6.

Bibtex

@article{5cbe866a1265480282a6150bfba41261,
title = "Un-VDNet: unsupervised network for visual odometry and depth estimation",
abstract = "Monocular visual odometry and depth estimation plays an important role in augmented reality and robots applications. Recently, deep learning technologies have been widely used in these areas. However, most existing works utilize supervised learning which requires large amounts of labeled data, and assumes that the scene is static. In this paper, we propose a novel framework, called as Un-VDNet, based on unsupervised convolutional neural networks (CNNs) to predict camera ego-motion and depth maps from image sequences. The framework includes three sub- networks (PoseNet, DepthNet, and FlowNet), and learns temporal motion and spatial association information in an end-to-end network. Specially, we propose a novel pose consistency loss to penalize errors about the translation and rotation drifts of the pose estimated from the PoseNet. Furthermore, a novel geometric consistency loss, between the structure flow and scene flow learned from the FlowNet, is proposed to deal with dynamic objects in the real-world scene, which is combined with spatial and temporal photometric consistency constraints. Extensive experiments on the KITTI and TUM datasets demonstrate that our proposed Un-VDNet outperforms the state-of-the-art methods for visual odometry and depth estimation in dealing with dynamic objects of outdoor and indoor scenes.",
author = "Xuyang Meng and Chunxiao Fan and Yue Ming and Yuan Shen and Hui Yu",
note = "No embargo. Xuyang Meng et al. {"}Un-VDNet: unsupervised network for visual odometry and depth estimation,{"} Journal of Photonics for Energy. Volume (Issue), [Article number] (MMDD, YYYY). DOI: DOI: http://dx.doi.org/10.1117/",
year = "2019",
month = dec,
day = "26",
doi = "10.1117/1.JEI.28.6.063015",
language = "English",
volume = "28",
journal = "Journal of Electronic Imaging",
issn = "1017-9909",
publisher = "Society of Photographic Instrumentation Engineers",
number = "6",

}

RIS

TY - JOUR

T1 - Un-VDNet: unsupervised network for visual odometry and depth estimation

AU - Meng, Xuyang

AU - Fan, Chunxiao

AU - Ming, Yue

AU - Shen, Yuan

AU - Yu, Hui

N1 - No embargo. Xuyang Meng et al. "Un-VDNet: unsupervised network for visual odometry and depth estimation," Journal of Photonics for Energy. Volume (Issue), [Article number] (MMDD, YYYY). DOI: DOI: http://dx.doi.org/10.1117/

PY - 2019/12/26

Y1 - 2019/12/26

N2 - Monocular visual odometry and depth estimation plays an important role in augmented reality and robots applications. Recently, deep learning technologies have been widely used in these areas. However, most existing works utilize supervised learning which requires large amounts of labeled data, and assumes that the scene is static. In this paper, we propose a novel framework, called as Un-VDNet, based on unsupervised convolutional neural networks (CNNs) to predict camera ego-motion and depth maps from image sequences. The framework includes three sub- networks (PoseNet, DepthNet, and FlowNet), and learns temporal motion and spatial association information in an end-to-end network. Specially, we propose a novel pose consistency loss to penalize errors about the translation and rotation drifts of the pose estimated from the PoseNet. Furthermore, a novel geometric consistency loss, between the structure flow and scene flow learned from the FlowNet, is proposed to deal with dynamic objects in the real-world scene, which is combined with spatial and temporal photometric consistency constraints. Extensive experiments on the KITTI and TUM datasets demonstrate that our proposed Un-VDNet outperforms the state-of-the-art methods for visual odometry and depth estimation in dealing with dynamic objects of outdoor and indoor scenes.

AB - Monocular visual odometry and depth estimation plays an important role in augmented reality and robots applications. Recently, deep learning technologies have been widely used in these areas. However, most existing works utilize supervised learning which requires large amounts of labeled data, and assumes that the scene is static. In this paper, we propose a novel framework, called as Un-VDNet, based on unsupervised convolutional neural networks (CNNs) to predict camera ego-motion and depth maps from image sequences. The framework includes three sub- networks (PoseNet, DepthNet, and FlowNet), and learns temporal motion and spatial association information in an end-to-end network. Specially, we propose a novel pose consistency loss to penalize errors about the translation and rotation drifts of the pose estimated from the PoseNet. Furthermore, a novel geometric consistency loss, between the structure flow and scene flow learned from the FlowNet, is proposed to deal with dynamic objects in the real-world scene, which is combined with spatial and temporal photometric consistency constraints. Extensive experiments on the KITTI and TUM datasets demonstrate that our proposed Un-VDNet outperforms the state-of-the-art methods for visual odometry and depth estimation in dealing with dynamic objects of outdoor and indoor scenes.

U2 - 10.1117/1.JEI.28.6.063015

DO - 10.1117/1.JEI.28.6.063015

M3 - Article

VL - 28

JO - Journal of Electronic Imaging

JF - Journal of Electronic Imaging

SN - 1017-9909

IS - 6

M1 - 063015

ER -

ID: 16958749