Giải pháp kết hợp Vision Transformer và Active Learning  Trong phát hiện và phân loại tổn thương gan

Hồ Chí Hưng; Phan Thượng Cang

Abstract

The liver is an essential organ in the human body, and most liver diseases and lesions are often difficult to detect early due to the lack of clear symptoms. This leads to a high risk of severe complications, particularly liver cancer, one of the leading causes of cancer-related deaths globally. This paper proposes using machine learning models such as DenseNet-121, VGG-16, and ViT to detect and classify liver lesions on 2008 CT scan images across arterial, delay, plain, and venous phases. The lesions are categorized into liver cysts, hemangiomas, and hepatocellular carcinoma, aiming to improve the efficiency of screening and early diagnosis. The results show that the ViT model achieved an accuracy of up to 0.99 with a short training time. Additionally, the paper highlights the major challenges of manual data labeling, which requires a significant amount of skilled labor, consumes time, and incurs high costs. Furthermore, the paper suggests the use of active learning to automate part of the labeling process, reducing labor requirements, saving time and costs, while ensuring consistency and data quality.

Keywords: DenseNet-121, VGG-16, ViT, Active Learning

Tóm tắt

Gan là cơ quan quan trọng trong cơ thể người, hầu hết các bệnh lý và các tổn thương ở gan thường khó phát hiện sớm do thiếu triệu chứng rõ ràng. Điều này dẫn đến nguy cơ biến chứng nặng, đặc biệt là ung thư gan, một trong những loại ung thư gây tử vong cao nhất toàn cầu. Bài báo đề xuất sử dụng các mô hình máy học như DenseNet-121, VGG-16 và ViT để phát hiện và phân loại tổn thương gan trên 2008 ảnh CT cắt lớp qua các thì arterial, delay, plain và venous. Các tổn thương được phân loại gồm nang gan, u mạch máu và ung thư biểu mô tế bào gan, nhằm nâng cao hiệu quả tầm soát và chẩn đoán sớm.Kết quả cho thấy mô hình ViT đạt độ chính xác lên đến 0.99 với thời gian huấn luyện ngắn. Ngoài ra, bài báo cũng chỉ ra rằng quá trình gán nhãn dữ liệu thủ công gặp nhiều thách thức lớn, bao gồm việc đòi hỏi một lượng lớn nhân lực có chuyên môn cao, tiêu tốn nhiều thời gian và chi phí. Bên cạnh đó, bài báo đề xuất sử dụng phương pháp học chủ động (active learning), nhằm tự động hóa một phần quy trình gán nhãn giúp giảm thiểu nguồn lực, tiết kiệm thời gian, chi phí và đảm bảo tính nhất quán cũng như chất lượng cho dữ liệu.

Từ khóa: DenseNet-121, VGG-16, ViT, Active Learning

References

[1]. Ashish Sharma; Shivaraj Nagalli, "Chronic Liver Disease," StatPearls Publishing, 2024.

[2]. Anh-Cang Phan, Hung-Phi Cao, Thanh-Ngoan Trieu, Thuong-Cang Phan, "Improving liver lesions classification on CT/MRI images based on Hounsfield Units attenuation and deep learning," Gene Expression Patterns, vol. 119289, p. 47, 2023.

[3]. Ramadhan, Muhammet Baykara, "A Novel Approach to Detect COVID-19: Enhanced Deep Learning Models with Convolutional Neural Networks," Applied Sciences, vol. 12, no. 18, 2022.

[4]. Xia et al., "Vision transformer with deformable attention," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 4794–4803, 2022.

[5]. Bhojanapalli et al., "Understanding Robustness of Transformers for Image Classification," In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10-17, 2021.

[6]. Najmul Hasan, Yukun Bao, Ashadullah Shawon and Yanmei Huang , "DenseNet Convolutional Neural Networks Application for Predicting COVID-19 Using CT Image," SN Computer Science, vol. 2, no. 389, 2021.

[7]. Sarkar, Hazra, Das, "Classification of colorectal cancer histology images using image reconstruction and modified DenseNet," CICBA, 2021.

[8]. Samuel Budd, Robinson and Bernhard Kainz, "A survey on active learning and human-in-the-loop deep learning for medical image analysis," Medical Image Analysis, vol. 71, 2021.

[9]. Gao, Chen, Niu and Plaza, "Recognition and mapping of landslide using a fully convolutional densenet and influencing factors," IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens, vol. 14, pp. 7881-7894, 2021.

[10]. Liu et al., "Swin transformer: Hierarchical vision transformer using shifted windows," Proceedings of the IEEE/CVF International Conference on Computer Vision, p. 10012–10022, 2021.

[11]. Wang et al.,"A novel LiDAR data classification algorithm combined capsnet with ResNet," Sensors, vol. 20, 2020.

[12]. Hyuna Sung et al., "Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries," GLOBOCAN, 2020.

[13]. Xiaoming Lv, Fajie Duan, Jia-Jia Jiang Xiao Fu and Lin Gan, "Deep Active Learning for Surface Defect Detection," Sensors 2020, 2020.

[14]. Deng, Jiang, Lan, Huang and Luo, "Image captioning using densenet network and adaptive attention," Signal Process. Image Commun, vol. 85, 2020.

[15]. Ahmad et al., "Intelligent hepatitis diagnosis using adaptive neuro-fuzzy inference system and information gain method," Soft Computing, pp. 10931-10938, 2019.

[16]. Fan et al., "Uncertainty metric in model-based eddy current inversion using the adaptive Monte Carlo method," Measurement, vol. 137, pp. 323-331, 2019.

[17]. Guo, Xu, Zhang, "Interstitial lung disease classification using improved DenseNet," Multimed Tools Appl, p. 78(21):30615–26, 2019.

[18]. Hu et al., "Pattern deep region learning for crack detection in thermography diagnosis system," Metals , vol. 8, p. 612, 2018.

[19]. Weibin Wang et al., "Classification of Focal Liver Lesions Using Deep Learning with Fine-Tuning," DMIP, 2018.

[20]. Ashish Vaswani et al., "Attention is all you need," Advances in neural information processing systems, , vol. 30, 2017.

[21]. Karen Simonyan, Andrew Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," 3rd International Conference on Learning Representations (ICLR 2015), pp. 1-14, 2015.

VLUTE JOURNAL OF SCIENCE

Combined solution Vision Transformer and Active Learning in detection and classification liver lesions

Abstract

Tóm tắt

References

Details

Issue

Section