Soh Nishimoto1*, Takuya Saito1, Hisako Ishise1, Toshihiro Fujiwara1, Kenichiro Kawai1, Maso Kakibuchi1
1Department of Plastic Surgery, Hyogo College of Medicine, 1-1, Mukogawa-cho, Nishinomiya, Hyogo, Japan
*Correspondence author: Soh Nishimoto, Department of Plastic Surgery, Hyogo College of Medicine, 1-1, Mukogawa-cho, Nishinomiya, Hyogo, Japan; Email: [email protected]
Published Date: 27-03-2023
Copyright© 2023 by Nishimoto S, et al. All rights reserved. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Objective: When high X-ray absorption rate materials such as metal prosthetics are in the field of CT scan, noise called metal artifacts might appear. In reconstructing a three-dimensional bone model from X-ray CT images, the metal artifacts remain. Often, the image of the scanning bed also remains. A machine learning-based system to reduce noises in the craniofacial CT images was constructed.
Methods: DICOM images of CT archives of patients with head and neck tumors were used. The metal artifacts and beds were removed from the threshold segmented images to obtain the target bony images. U-nets, respectively with the function loss of mean squared error, Dice and Jaccard, were trained by the datasets consisting of 5671 DICOM images and corresponding target images. DICOM images of 2000 validation datasets were given to the trained models and predicted images were obtained.
Results: The use of mean squared errors presented superiority to Dice or Jaccard loss. The mean prediction error pixels were 14.43, 778.57, and 757.60 respectively per 512 x 512 pixeled image.
Conclusion: Automatic CT image noise reduction system was constructed. Dedicated to the delineation of craniofacial bones, the presented study showed high prediction accuracy. The “correctness” of the predictions made by this system may not be guaranteed, but the results were generally satisfactory.
Keywords: Noise Reduction; Metal Artifact; Machine Learning; U-Net; Function Loss
Introduction
X-ray CT of the craniofacial region is indispensable in obtaining bone information for the diagnosis of craniofacial fractures, facial morphology, and occlusal status. Noise called “metal artifact” appears in the image if there are materials with a high X-ray absorption rate, such as metal prosthetics within the imaging area. Although some efforts are made to reduce these artifacts during imaging, metal artifacts are often seen in the images that clinicians receive [1-3]. Metal artifacts remain when the bone area is extracted by setting a threshold on CT values, for example, to construct a three-dimensional bony image. The bed of the CT imaging system also remains in the image. These noises disturb observing the 3D models. It is necessary to remove noises from each image manually using image processing software. It is a very time-consuming process. In this study, an automatic system was constructed to retrieve bone images without the noises from craniofacial CT images. Neural networks, with the same structure, were trained with three different loss functions and compared.
Materials and Methods
All procedures were done on a desk-top personal computer with a GPU: GeForce RTX3090 24.0GB ((nVIDIA, Santa Clara, CA, USA), Windows 10 Pro (Microsoft Corporations, Redmond, WA, USA). Python 3.8 (Python Software Foundation, DE USA): a programing language, was used under Anaconda 15 (FedoraProject http://fedoraproject.org/wiki/Anaconda#Anaconda_Team_Emeritus) as an installing system, and Spyder 4.1.4 as an integrated development environment. Keras 3 (https://keras.io/): the deep learning library, written in Python was run on TensorFlow 2.5 (Google, Mountain View, CA, USA). GPU computation was employed through CUDA 10.0 (nVIDIA). For 3D reconstruction, slicer 4.11 (www.slicer.org) was used with Jupyter Notebook (https://jupyter.org/). OpenCV 3.1.0 libraries https://docs.opencv.org/3.1.0/) were used in image processing.
Datasets
CT Images
From The Cancer Imaging Archive Public Access (wiki.cancerimagingarchive.net), Head-Neck-Radiomics-HN1, the collection of CT images from head and neck squamous cell carcinoma patients was retrieved [4]. It consists of the folder of each patient, containing 512 x 512 pixels DICOM (Digital Imaging and Communications in Medicine) axial images (value ranged 0 to 4071 for each pixel), taken at 5 mm intervals in the cephalocaudal direction. The order of the images was checked and images from the top of the head to the mandible were extracted for 120 cases [5].
Bony Image Segmentation
Bone Region Extraction by Threshold
Using a Python library: pydicom (https://pydicom.github.io), the slice image was read from each DICOM file. To extract high-density areas, pixel values less than 1200 were replaced with 0. Areas with pixel value greater than 2040 were replaced with 2040. From the thresholder values, 1020 was subtracted and divided by 4. The yield values ranged from 0 to 255. Images were stored as PNG (Portable Network Graphics) files (Fig. 1).
Figure 1: Making target image. The original DICOM images (pixel value ranged 0 to 4071) were thresholder and converted to PNG files. The noises remaining in the files were removed manually. The pixel values were binarized to 0 or 255.
Manual Noise Reduction
Noises, such as metal artifacts, beds, etc. in thresholded PNG images were checked one by one visually and removed using image processing software: GIMP (https://www.gimp.org).
Binarization
The images were binarized with pixel value 10 as the threshold (0 or 255) and saved as PNG files (target images).
Neural Network and Learning
U-net
A U-net model (Fig. 2.) was constructed using keras-unet (https://pypi.org/project/keras-unet/). Input and output shapes were 512 x 512. For final output activation, ReLU (Rectified Linear Unit) was used. Batch normalization option was put on. Dice loss, Jaccard loss and mean squared error were utilized as the loss function respectively. As the optimizer, Adam was used [6,7].
Figure 2: The U-net model used. Input and output were 512 x512.
Machine Learning
Datasets of the original DICOM image and corresponding target image, counting 7671 pairs, were divided into 5671 training datasets and 2000 validation datasets. DICOM values were divided by 1000 and target image values were divided by 255 to normalize (binarized 0 or 1). The U-net models were trained with the training datasets, with early stopping option (https://keras.io/api/callbacks/early_stopping/) and the best weights were saved.
Validation
DICOM images of 2000 validation datasets were given to the trained models and predicted images were obtained. The predicted images were binarized by the threshold of 0.5. Mean squared errors between the binarized predicted images and the target images were calculated. Error pixel count for a dataset was obtained by multiplying the mean square error and 512 x 512. To visualize, the binarized prediction images were shown in the green channel. The target images were shown in the red channel. When merged, matching pixels were shown yellow. Error pixels were shown green or red.
Results
Training and validation accuracy with the variety of loss functions is shown in Fig. 3. Mean squared errors presented superiority to Dice or Jaccard loss.
The prediction errors and the number of error pixels are shown in Table 1. As an example, predictions of the same image (the worst result with mean squared loss) by the networks trained with the different loss functions are shown in Fig. 4.
The merged images with the worst 6 prediction errors by the trained neural networks with each loss function are shown in Fig. 5. A three-dimensional bone reconstruction example of the same original DICOM files, segmented by CT number threshold and neural networks trained with the different loss functions, is shown in Fig. 6.
Figure 3: The transition of accuracy with the function loss of Mean Squared Error (MSE), Dice loss (Dice), and Jaccard loss (Jaccard). Vertical axis is shown in logarithm.
Figure 4: An example of the prediction difference for the same test image (for the model trained with mean squared loss, this image scored the worst). The predicted images are shown in green channel, and the target in red. In the merged image, the correct pixels are shown in yellow or black. The error pixels are shown in green or red.
Figure 5: The merged images with the worst 6 errors predicted by the models trained with the three function losses.
Figure 6: A series of DICOM images for a patient were processed by threshold or predicted by the trained models with three different function losses. The images were over-written on the DICOM files. They were three-dimensionally reconstructed with slicer 4.11.
Function Loss | Mean Squared Error in an Image | Stdev | Mean Error Pixels/Image | Stdev | Worst Mean Squared Error in an Image | Worst Error Pixels/Image |
Dice | 0.00297 | 0.00194 | 778.57 | 508.56 | 0.0135 | 3539 |
Jaccard | 0.00289 | 0.00179 | 757.6 | 469.24 | 0.0127 | 3329 |
Mean Squared Error | 0.0000551* | 0.00014 | 14.43* | 37.75 | 0.00187* | 490* |
Table 1: The mean and the worst prediction errors for the 2000 test images, by trained models with different function losses.
*: p<0.001 Bonferroni’s test.
Discussion
There have been several reports of metal artifact reduction methods, including utilizing convolution neural networks or generative adverse networks [8-11]. Most of them intended to restore not only the hard tissue but also the soft tissue.
This study is dedicated to the delineation of craniofacial bones. The target images were binarized. The simplification of the target images may have contributed to the high prediction accuracy. Networks trained with various loss functions showed differences in prediction accuracy. In our setting, mean squared loss presented superiority to Dice loss or Jaccard loss.
The accuracy of machine learning means whether the functional laws hold true in the validation data, that describes the relationship between inputs and outputs in the training data, even if the prediction accuracy of a well-trained model is high, if the training data itself is not “correct”, the system may not be giving the “correct” answer. In this study, the key to “correctness” was the creation of a bone region image (target images), where artifacts and beds were manually removed. But in the original DICOM images, artifacts mask true information. We had to predict the truth, which is impossible to reproduce completely, with our anatomical and clinical knowledge. Therefore, the “correctness” of the predictions made by this system may not be guaranteed, but we believe that the predictions made by this system are generally satisfactory.
Conclusion
A U-net based noise reduction system for cranio-facial CT to segment bone was constructed. High prediction accuracy was achieved.
Acknowledgement
There is no financial support. This paper is a revised version of a preprint: medRxiv, 27 Jun 2022 (Nishimoto, et al., Machine learning-based noise reduction for craniofacial bone segmentation in CT images. DOI: 10.1101/2022.06.26.22276925)
Conflict of Interest
The authors have no conflict of interest to declare.
References
- Katsura M, Sato J, Akahane M, Kunimatsu A, Abe O. Current and novel techniques for metal artifact reduction at CT: Practical guide for radiologists. Radiographics. 2018;38:450-61.
- Meyer E, Raupach R, Lell M, Schmidt B, Kachelrieß M. Normalized Metal Artifact Reduction (NMAR) in computed tomography. Medical Physics. 2010;37:5482-93.
- Meyer E, Raupach R, Lell M, Schmidt B, Kachelrieß M. Frequency Split Metal Artifact Reduction (FSMAR) in computed tomography. Medical Physics. 2012;39:1904-16.
- Blake G. Head-Neck-Radiomics-HN1 – The Cancer Imaging Archive (TCIA) Public Access – Cancer Imaging Archive Wiki. 2020. [Last accessed on: March 20, 2023] https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-Radiomics-HN1
- Nishimoto S, Saito T, Ishise H, Fujiwara T, Kawai K, Kakibuchi M. Three-dimensional cranio-facial landmark detection in CT slices from a publicly available database, using multi-phased regression networks on a personal computer. MedRxiv. 2021;2021:21253999.
- Glorot X, Bordes A, Bengio Y. Deep Sparse Rectifier Neural Networks. In: Gordon G, Dunson D, Dudík M, Eds. Proceedings of the fourteenth international conference on artificial intelligence and statistics. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR. 2011;15:315-23.
- Kingma DP, Ba JL. Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015 – Conference Track Proceedings. 2014.
- Park HS, Lee SM, Kim HP, Seo JK. Machine-learning-based nonlinear decomposition of CT images for metal artifact reduction. 2017.
- Zhang Y, Yu H. Convolutional neural network based metal artifact reduction in X-Ray computed tomography. IEEE Transactions on Medical Imaging. 2018;37:1370-81.
- Nakao M, Imanishi K, Ueda N, Imai Y, Kirita T, Matsuda T. Three-dimensional generative adversarial nets for unsupervised metal artifact reduction. IEEE Access. 2019;8:109453-65.
- Nakamura M, Nakao M, Imanishi K, Hirashima H, Tsuruta Y. Geometric and dosimetric impact of 3D generative adversarial network-based metal artifact reduction algorithm on VMAT and IMPT for the head and neck region. Radiat Oncol. 2021;16:96.
Article Type
Research Article
Publication History
Received Date: 28-02-2023
Accepted Date: 20-03-2023
Published Date: 27-03-2023
Copyright© 2023 by Nishimoto S, et al. All rights reserved. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation: Nishimoto S, et al. Machine Learning-Based Noise Reduction for Craniofacial Bone Segmentation in CT Images. J Dental Health Oral Res. 2023;4(1):1-7.
Figure 1: Making target image. The original DICOM images (pixel value ranged 0 to 4071) were thresholder and converted to PNG files. The noises remaining in the files were removed manually. The pixel values were binarized to 0 or 255.
Figure 2: The U-net model used. Input and output were 512 x512.
Figure 3: The transition of accuracy with the function loss of Mean Squared Error (MSE), Dice loss (Dice), and Jaccard loss (Jaccard). Vertical axis is shown in logarithm.
Figure 4: An example of the prediction difference for the same test image (for the model trained with mean squared loss, this image scored the worst). The predicted images are shown in green channel, and the target in red. In the merged image, the correct pixels are shown in yellow or black. The error pixels are shown in green or red.
Figure 5: The merged images with the worst 6 errors predicted by the models trained with the three function losses.
Figure 6: A series of DICOM images for a patient were processed by threshold or predicted by the trained models with three different function losses. The images were over-written on the DICOM files. They were three-dimensionally reconstructed with slicer 4.11.
Function Loss | Mean Squared Error in an Image | Stdev | Mean Error Pixels/Image | Stdev | Worst Mean Squared Error in an Image | Worst Error Pixels/Image |
Dice | 0.00297 | 0.00194 | 778.57 | 508.56 | 0.0135 | 3539 |
Jaccard | 0.00289 | 0.00179 | 757.6 | 469.24 | 0.0127 | 3329 |
Mean Squared Error | 0.0000551* | 0.00014 | 14.43* | 37.75 | 0.00187* | 490* |
Table 1: The mean and the worst prediction errors for the 2000 test images, by trained models with different function losses.
*: p<0.001 Bonferroni’s test.