2E-Net: Unet-like Double Encoder for Medical Image Segmentation
DOI: https://doi.org/10.62517/jes.202402305
Author(s)
Jiahao Li1, Yanbing Liang2,*
Affiliation(s)
1Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, China
2College of Science, North China University of Science and Technology, Tangshan, Hebei, China
*Corresponding Author
Abstract
As an important branch in the field of medical image segmentation, abdominal multi-organ segmentation plays a crucial role in disease diagnosis and treatment supervision. Through long-term research in the field of medical image segmentation, a series of medical image segmentation models represented by U-Net have made significant progress. There are many types of abdominal organs with different sizes and blurred boundaries, so the existing models still fail to achieve satisfactory segmentation results. Most existing abdominal multi organ segmentation methods use pure CNN, pure Transformer, or a combination of both to construct the entire network structure, which cannot better leverage the advantages of both. In order to solve these problems, we propose a double encoder medical image segmentation method called 2E-Net, which improves the MISSFormer method based on Transformer by introducing ResNet50. 2E-Net introduces an Encoder Fusion Module (EFM) to merge the two encoders for feature extraction. Extensive experiments on the Synapse dataset show that our proposed method outperforms the MISSFormer method, achieving a Dice Similarity Coefficient (DSC) score of 82.13%.
Keywords
Transformer; Attention Mechanism; Medical Image Segmentation; CNN
References
[1] Z. Zhou, M.M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, Springer, 2018: pp. 3–11.
[2] H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.-W. Chen, J. Wu, UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation, in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Barcelona, Spain, 2020: pp. 1055–1059. https://doi.org/10.1109/ICASSP40776.2020.9053405.
[3] O. Oktay, J. Schlemper, L.L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N.Y. Hammerla, B. Kainz, Attention u-net: Learning where to look for the pancreas, arXiv Preprint arXiv:1804.03999 (2018).
[4] C. Li, Y. Tan, W. Chen, X. Luo, Y. Gao, X. Jia, Z. Wang, Attention Unet++: A Nested Attention-Aware U-Net for Liver CT Image Segmentation, in: 2020 IEEE International Conference on Image Processing (ICIP), IEEE, Abu Dhabi, United Arab Emirates, 2020: pp. 345–349. https://doi.org/10.1109/ICIP40778.2020.9190761.
[5] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, J. Liu, CE-Net: Context Encoder Network for 2D Medical Image Segmentation, IEEE Trans. Med. Imaging 38 (2019) 2281–2292. https://doi.org/10.1109/TMI.2019.2903562.
[6] S. Feng, H. Zhao, F. Shi, X. Cheng, M. Wang, Y. Ma, D. Xiang, W. Zhu, X. Chen, CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation, IEEE Trans. Med. Imaging 39 (2020) 3008–3018. https://doi.org/10.1109/TMI.2020.2983721.
[7] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer, 2015: pp. 234–241.
[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, \Lukasz Kaiser, I. Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems 30 (2017). https://proceedings.neurips.cc/paper/7181-attention-is-all (accessed November 13, 2023).
[9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, An image is worth 16x16 words: Transformers for image recognition at scale, arXiv Preprint arXiv:2010.11929 (2020).
[10] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), IEEE, Montreal, QC, Canada, 2021: pp. 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986.
[11] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A.L. Yuille, Y. Zhou, Transunet: Transformers make strong encoders for medical image segmentation, arXiv Preprint arXiv:2102.04306 (2021).
[12] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, 2016: pp. 770–778. https://doi.org/10.1109/CVPR.2016.90.
[13] H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang, Swin-unet: Unet-like pure transformer for medical image segmentation, in: European Conference on Computer Vision, Springer, 2022: pp. 205–218.
[14] X. Huang, Z. Deng, D. Li, X. Yuan, Missformer: An effective medical image segmentation transformer, arXiv Preprint arXiv:2109.07162 (2021).