STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Pose-guided Human Feature Aggregation for Occluded Person Re-identification
DOI: https://doi.org/10.62517/jbdc.202301407
Author(s)
Zhe Zhang1,2, Zongwen Bai1,2,*,Meili Zhou1,2
Affiliation(s)
1Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data, School of Physics and Electronic Information, Yan’an University, Yan’an, Shaanxi, China. 2School of Physics and Electronic Information, Yan’an University, Yan’an, Shaanxi, China *Corresponding Author
Abstract
Since the appearance of most pedestrians is often obscured by various obstacles. Some existing works solve the occlusion problem by aligning the query image of the target pedestrian with the body part of the gallery image, but the body structure of the pedestrian is complicated and not easy to align. Therefore, this paper introduces a Human Feature Aggregation (HFA) approach based on Transformer without alignment, which uses pose information to separate the body parts of target pedestrians from the occlusion. This method utilizes pose information to separate the body parts of the target pedestrian from the obstructions. Initially, the Vision Transformer incorporates Convolutional Neural Network (CNN) advantages to enhance extraction more fine-grained global and local features. Subsequently, the body parts of the target pedestrian are separated from the obstructions using pose information extracted by a pose estimator. Finally, in the human feature aggregation module, local features are matched and fused with pose information to enrich the human features. It steers the model towards focus more on body parts. The experimental findings indicate that the proposed HFA approach surpasses alternative methods on multiple benchmark datasets.
Keywords
Vision Transformer; Pose Information; Pose Estimator; Feature Aggregation
References
[1] Sun Y, Zheng L, Yang Y, et al. Beyond Part Models: Person Retrieval with Refined Part Pooling (and a strong convolutional baseline). European Conference on Computer Vision, 2018: 501-518. [2] Wang, F., Jiang, M., Qian, C., et al. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6199-6208), 2018. [3] Zhedong Zheng, Liang Zheng, and Yi Yang. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV, 2017. [4] Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint:1703.07737, 2017. [5] Zheng L, Huang Y, Lu H, et al. Pose-invariant embedding for deep person re-identification. IEEE Transactions on Image Processing, 2019, 28(9): 4500-4509. [6] He, S.; Luo, H.; Wang, P.; et al. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021. [7] Sun, K.; Xiao, B.; Liu, D.; et al. 2019b. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5693–5703. [8] Li, Y.; He, J.; Zhang, T.; et al. Diverse Part Discovery: Occluded Person Re- Identification With Part-Aware Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2898–2907, 2021. [9] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, et al. Zhai. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. [10] He, S.; Luo, H.; Wang, P.; et al. 2021. Transreid: Transformer-based object re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). [11] Sun, K.; Xiao, B.; Liu, D.; et al. 2019b. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5693–5703. [12] Peixian Chen, Wenfeng Liu, Pingyang Dai, et al. Occlude them all: Occlusion-aware attention network for occluded person re-id. In ICCV,pages 11833–11842, 2021. 3, 6, 7. [13] Zhu, K.; Guo, H.; Liu, Z.; et al. 2020. Identity-guided human semantic parsing for person re- identification. In European Conference on Computer Vision (ECCV), 346–363. [14] Ge, Y.; Li, Z.; Zhao, H.; et al. 2018. Fd-gan: Pose-guided feature distilling gan for robust person re-identification. arXiv preprint arXiv:1810.02936. [15] Miao, J.; Wu, Y.; Liu, P.; et al. 2019. Pose-guided feature alignment for occluded person re- identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 542–551. [16] Shang Gao, Jingya Wang, Huchuan Lu, et al. Pose-guided visible part matching for occluded person reid. In CVPR, pages 11744–11752, 2020. 1, 6 [17] Li, Y.; He, J.; Zhang, T.; et al. 2021. Diverse Part Discovery: Occluded Person Re- Identification With Part-Aware Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2898–2907. [18] Fang, H.-S.; Xie, S.; Tai, Y.-W.; et al. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV. [19] Cao, Z.; Simon, T.; Wei, S.-E.; et al. 2017. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved