FOREWORD

1. INTRODUCTION

When the blood vessels in the area around the brain burst, the surrounding tissues of the brain will bleed, leading to a critical case called “Brain Hemorrhage” (Grey, 2024; Rather et al., 2024; Sheikh et al., 2024). The increased pressure on the brain caused by such bleeding can cause severe damage to the brain cells and may even lead to death (Ahmed et al., 2024; Neethi et al., 2024; Schiariti et al., 2024; Zhang et al., 2024). A stroke problem that a brain hemorrhage can cause is considered the second leading cause of death around the world. However, the three main causes of such issues are metabolic risks, behavioural factors, and environmental risks (Feigin et al., 2022; Neethi et al., 2024). Mental changes, headaches, difficulty in speaking, weakness, lack of balance, and even vision issues are all symptoms of stroke caused by hemorrhage (Suryadi, 2024; Studer & Thompson, 2024; Akmaljon et al., 2024).

Deep learning is one of the most powerful artificial intelligence technologies that has been recently used for the aim of brain hemorrhage detection and prediction to help physicians correctly and effectively detect this problem, and further make the appropriate treatment (Del Gaizo et al., 2024; Majeed et al., 2024; Haldorai et al., 2024; Hu et al., 2024). Convolutional neural networks (CNN) and their newer versions, including VGG16, ResNets, MobileNet, DenseNet and Inception (Ibrahim & Mahmood, 2023). Deep learning-based classification models, especially the models that use the techniques of learning transfer such as, DenseNet121, have shown high accuracy the classification of the medical image, these classification models improve accuracy in diagnosis by utilizing pre-trained architectures which help to extract relevant features from the CT scans and differentiate between diseases and non-diseases cases. Ahmed et al. (2024) and Prasher et al. (2024) proposed very effective models to extract the image features that can be used in a classification framework to make an automatic brain hemorrhage detection system. On the other hand, results from Murad et al (2023) show that CNNs have been very effective in the automatic classification of medical images, specifically for CT scan images where disease detection is crucial. By automatically extracting relevant features from these images, CNNs improve accuracy by ensuring appropriate conditions are assigned promptly. However, some recent pieces of research started to make a fusion of the CNN-based model and other sequence processing models like the recurrent neural networks (RNN) (Sindhura et al., 2024; Datta & Rohilla, 2024; Kothala & Guntur, 2024; Lafraxo et al., 2024).

Deep learning-based models (CNN, auto-encoder AE, and stacked auto-encoder SAE) were introduced by Helwan et al (2018) with the aim of brain hemorrhagic detection. They utilized the “Brain CT Hemorrhage Dataset” consisting of 6772 images of both normal and hemorrhagic cases. However, they utilized a part of the dataset consisting of 2527 images. They found that the SAE model was the best model with the highest accuracy. They registered an accuracy of 90.9%. Although the AE model is a good DL architecture, it still has its performance limitations.

In a study by (Hssayeni et al., 2020), they utilized the “Brain CT Images with Intracranial Hemorrhage Masks” dataset for brain hemorrhagic segmentation using a fully convolutional network (FCN) “U-Net”. They achieved an accuracy of 87%.

In another study, Altuve and Pérez (2022) utilized the well-known ResNet18 model in a transfer learning way for the aim of brain hemorrhage detection. They used a small dataset of only 100 normal and 100 hemorrhagic CT images. They got an accuracy of 96% and a precision of 97%. However, their methodology is already known and the utilized dataset is too small.

On the other hand, Kothala and Guntur (2024) proposed the stacked bidirectional GRU-LSTM model along with the traditional CNN model to detect possible hemorrhage in CT brain images. They utilized the “Brain CT Hemorrhage Dataset,” consisting of 6772 images of both normal and hemorrhagic cases, and achieved training and test accuracies of 96.2% and 93.4%, respectively. Their approach also achieved precision, recall, and F1-score values of 62%, 68%, and 65%, respectively. However, the low values of precision and recall indicate a high percentage of false positive and false negative errors. They also tried many transfers learning methods, including LeNet, ResNet50, and AlexNet, and they registered a low performance compared to their proposed CNN-BiLSTM model.

The EfficientNetB0 model was utilized as a feature extraction and classification model for brain hemorrhage detection (Feng et al., 2023). They utilized a CT brain dataset consisting of 561 images of the normal and spontaneous intracerebral hemorrhage issue. Their proposed methodology achieved an accuracy in a range between 70% and 86.6% with an Area Under the Curve (AUC) value of 0.71 to 0.83.

Like traditional forms of feature selection, machine learning models used for the classification of medical data can greatly benefit from feature selection methods for improved predictive outcomes in medical diagnoses (Siddiq Hassan, 2013).

The American College of Neuroradiology (ASNR) hemorrhagic dataset was used in a study by He et al. (2024). They applied the multiscale feature classification supported by the attention fusion method and the weakly supervised localization model. They evaluated their model using only the AUC value. However, their methodology achieved AUC values of 0.89 to 0.995.

In a different study, (Malik et al., 2024) compared the performance of many deep learning models in the field of brain hemorrhage detection. They utilized 2500 images of the “Brain CT Images with Intracranial Hemorrhage Masks” dataset. Their methodology achieved accuracy values of 93.29%, 90%, 82.35%, and 39.45% using the EfficientNet, ResNet50, SEResNeXt, and ResNeXt models, respectively.

There are many problems in the current state-of-art methodologies, starting from the usage of small datasets, moving to the issue of using traditional DL architectures without any modifications, struggling with high computational time or low accuracy, and ending with the problem of a bad evaluation process. However, in this study, the key contributions are listed as follows:

1- This is the first study that introduces a hybrid model of vision transformer models (ViT) and the bidirectional long-short-term memory (LSTM) architectures to enhance the performance of the current brain hemorrhage detection systems. The proposed model is denoted as “ViTBiLSTM”. The ability of the ViTBiLSTM model to capture a better feature representation of the images based on its self-attention mechanism will improve the accuracy and solve the low performance of the traditional CNNs.

2- Since the ViT is considered a lightweight model, the computational time required for the training and validation steps will be low compared to other more complex architectures (solve the high computational time problem).

3- This study takes into consideration the problem of generalization of the proposed model by evaluating it using two different CT brain datasets.

4- The study introduces a comprehensive analysis of the performance of the proposed ViTBiLSTM model using different optimizers. It compares the proposed methodology with the original ViT model to show its efficiency.

5- The study utilizes all possible performance evaluation metrics in order to make a comprehensive assessment of the proposed model.

The next paragraphs will be organized as follows. First, the materials and proposed methodologies with the detailed architecture of the ViTBiLSTM model will be introduced. Next, the main results and findings of the adopted model will be listed (for both utilized datasets). Then, a comprehensive discussion and ablation study will be presented. Finally, the conclusion, limitations, and future work will be given.

2. MATERIALS AND METHODS

CT Brain Datasets:

In this study, two datasets of both normal and hemorrhagic brain images are utilized. The first dataset, “Brain CT Hemorrhage Dataset,” consists of 6772 CT scans (4105 normal and 2667 hemorrhagic), which were originally collected from the Near East Hospital (Helwan et al., 2018). The second dataset is the “Brain CT Images with Intracranial Hemorrhage Masks” (Hssayeni et al., 2020), which consists of CT scans of both bones and the brain. However, in this study, the brain CT images (82 subjects: 2500 images; 2182 normal, and 318 hemorrhagic) will be utilized. Figure 1 shows some samples of these utilized datasets (the third row corresponds to the second dataset). Table 1 summarizes the characteristics of both utilized datasets.

While ViT models are powerful in capturing global spatial dependencies based on their self-attention mechanisms, they may lack the ability to model sequential dependencies across feature representations. For this reason, utilization of the BiLSTM model can guarantee capturing the sequential patterns in the input data (The input image of the ViT model is split into adjacent blocks that are spatially relevant). From another point of view, using ViT against traditional CNN improves the ability of the model to understand the spatial structure of the input image, leading to a better feature representation compared to CNN.

The vision transformer model (ViT) was mainly introduced in a study by Dosovitskiy et al. (2020). The input image of the ViT model (in our case CT brain image) is divided into SxS patches (not overlapped). These PxP boxes (patches) are then linearly embedded using an embedding layer that transforms them into a sequence of tokens. Position-embedding information is also added to each image patch to provide it with spatial information.

Let the input image, where , , and C are the height, width, and number of channels, respectively. Patch emedding of the ViT model divided input image into PxP patches with a total number of patches (N=H*W/P²).

Each patch is flattened and linearly projected into a D-dimensional embedding space:

(1)

Where E is the position matrix, while P_i is the positional encoding of the patch i, and z₀ⁱ is the embedded representation of the patch i.

These positional-encoded embedding sequences are then fed into the main ViT backbone, which is a Transformer-based architecture. The main module in this architecture is the multi-head attention layers (Du et al., 2024) that are responsible for extracting features based on the attention mechanism (i.e., the model updates its weights to concentrate on the most essential parts of the image). Each head is defined by Equation II.

(2)

Where Q,k, and V are the Query, Key, and Value, which are the three different representations of the multi-head attention layer.

The outputs of the transformer encoder model will be the final feature vector which will be delivered to the classification part. However, the original classification part of the ViT model is a multi-layer perceptron and a classification with a softmax layer. In this study, we removed this part and replaced it with a more powerful classification model consisting of Bi-LSTM architecture which fits the idea of a transformer model since the input image is decomposed into patches and the Bi-LSTM module can be very suitable to retain the local information of the gray levels of the adjacent patches in the original image, leading to a better classification task. To make the output of the transformer model suitable for the input of the Bi-LSTM model, the study suggests using a reshape layer, which is responsible for changing the output feature vector of the transformer model into a (1,768) shape, which is suitable to the input shape of the Bi-LSTM model. Moreover, a batch normalization layer is added to the output of the Bi-LSTM model to improve training stability and make a faster convergence (Rivoir et al., 2024; Tin et al., 2024; Fei et al., 2024). Finally, two dense layers are added as a hidden fully connected layer with a softmax activation function in the final layer. A dropout layer is also inserted before the last layer with a drop percentage of 35% to regularize the model and prevent overfitting. Moreover, in the present study, the B32 version of the ViT model is utilized (Liu & Aldrich, 2024), which means that the input image is divided into 49 patches (so if the input image size is 224*224, then the image will be divided into 7*7 patches). The output of the transformer model and the reshaped version is given in equations III and IV.

(3)

Where, Z_vit is the output of the encoder part of the ViT model and is given as follows:

(4)

Where, N is the number of patches, Z is the ViT output, W1 is the weight matrix of the first dense layer, W2 is the weight matrix of the second dense layer, is the ReLU activation function, b1 and b2 are the bias vectors.

Performance Evaluation Metrics:

The final step in this study is the evaluation process in which the trained ViTBiLSTM model will be evaluated to judge its performance and select the best parameters that achieve the best performance. For this reason, the main utilized computations, which are True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), are computed. Using these raw computations, the precision, recall, F1-score, and accuracy metrics are also computed for the individual classes, and then both macro and weighted average calculations are derived. The weighted average is computed by assigning a weight for each class based on the number of samples, making the average score correspond to the class’s percentage. On the other hand, macro average applies no weighting on the individual score and it’s more suitable for performance calculations in case of unbalanced datasets to check any possible individual class’s errors. The confusion matrix (CM) that shows the TP, TN, FP, and FN of each class is also drawn. CM shows how well the trained model is performing since it compares the original classes with the predicted ones. It gives detailed calculations of the individual classes’ true and false predictions allowing to precisely define the classes with the best and worst performance. Figure 4 states the CM according to the probled addressed in this study. The receiver operating characteristics (ROC) curve, which represents the relationship between true positive rate (TPR) and false positive rate (FPR), is also drawn. The area under the curve metric is also computed using the ROC curve. The training time of all proposed scenarios is also computed. Equations 6. 7, 8, and 9 show the formula of precision, recall, score, and accuracy, respectively (Khozama & Mayya, 2022; Szabó et al., 2024; Hoang et al., 2024).

(6)

(7)

(8)

(9)

Figure 4: CM description according to the brain hemorrhage prediction problem

Ethical Approval and Consent:

The study's design and procedures were reviewed and approved by the Ethics and Scientific Committee of the College of Medicine at the University of Zakho with the reference number (FEB2024/UOZE440).

3. RESULTS

The experiments in this study are applied to the utilized dataset using the training parameters illustrated in Table 2. In all training scenarios, the number of epochs is 50, the utilized batch size is 64, the loss function is the categorical cross entropy (two classes in the classification layer), learning rate is 0.0001, image size is 224*224, and the early stop condition is enabled with a patience value of 5 (if the validation loss isn’t improved for 5 epochs, the training will be stopped to prevent possible overfitting or redundant training time).

Table 9: Comparison with related work

Study	Methodology	Dataset	Dataset size	Results & Notes
(Helwan et al., 2018)	SAE	Brain CT Hemorrhage Dataset	6772 images	Accuracy=90.9%
(Hssayeni et al., 2020)	FCN-U-Net	Brain CT Images with Intracranial Hemorrhage Masks	2500 images	Accuracy=87%
(Altuve & Pérez, 2022)	ResNet18 (Transfer Learning)	A small dataset	200 images	Accuracy=96%
(Kothala & Guntur, 2024)	Stacked bidirectional GRU-LSTM and CNN	Brain CT Hemorrhage Dataset	6772 images	Accuracy=93.4%
(Feng et al., 2023)	EfficientNetB0	CT brain dataset	561 images	Accuracy=70%-86.6%
(He et al., 2024)	Multiscale feature classification supported by the attention	American College of Neuroradiology (ASNR)	-	AUC = 0.89-0.995
(Malik et al., 2024)	EfficientNet	Brain CT Images with Intracranial Hemorrhage Masks (2500 images)	2500 images	Best Accuracy=93.29%
Current Study	Novel ViTBiLSTM model	Brain CT Hemorrhage Dataset	6772 images	Accuracy: 99.7% (ADAM), 100% (RMSProp)
Current Study	Novel ViTBiLSTM model	Brain CT Images with Intracranial Hemorrhage Masks	2500 images	Accuracy: 96.94%

CONCLUSION

In this study, a novel deep learning framework called the ViTBiLSTM model consists of two main parts; the feature extraction part which is the ViT model responsible for extracting features of the images, while the second part is the classification part in which the BiLSTM model is utilized to make the classification instead of the original transformer classification part. The BiLSTM is chosen since it fits the idea of decomposing input images into patches and maintaining the information of the adjacent pixels of the patches to improve the ability to extract better information. The study utilizes two different CT image datasets; one contains the problem of data balance, and the other one contains a higher number of samples. Both datasets are pre-processed and data augmentation operations are applied for a better training process. Many experiments are applied: one with the original ViT model and other experiments with the ViTBiLSTM model with different optimizers (ADAM, NADAM, ADAMX, and RMSProp). Results showed that the best case is the usage of the ViTBiLSTM model with RMSProp optimizer with an accuracy of 100% of the first dataset. The second dataset got the best performance using the data balance operation and the ViTBiLSTM model with an accuracy of 96.94%. A comparison of the current study with the previous studies in the same field was also performed and proved the robustness and high performance of the proposed ViTBiLSTM against the traditional ViT models, CNN, transfer learning-based, and CNN-LSTM models. Future studies can focus on the utilization of other different datasets and the fusion of some feature extraction and classification DL-based architectures for better enhancement. Moreover, the current research focused on the binary classification of brain hemorrhage; however, future studies can focus on the multi-class classification problem.

Acknowledgements:

I would like to express my gratitude to the University of Duhok and the University of Zakho for their time and consideration in supporting my academic endeavours.

Statements and Declarations:

Ethical Approval :

All authors gave verbal informed consent for their participation. The study's design and procedures were reviewed and approved by the Research Ethics Committee of the College of Medicine, UOZ, in compliance with ethical standards (Code UOZE448; 2024).

Conflict of Interest: The author declared that no potential conflict of interest.

Author Contributions: The author has reviewed the final version to be published and agreed to be accountable for all aspects of the work.

Consent to Participate: The author has consented to submit this article to this journal.

Consent to Publish: The author has consented to publish this article in this journal.

Funding: The study has not receive any specific funding from public, commercial, or any non-profit organizations.

Concept and Design: Delveen Luqman Abd Alnabi.

Acquisition, Analysis, or Interpretation of Data: Delveen Luqman Abd Alnabi

Drafting of the Manuscript: Delveen Luqman Abd Alnabi.

REFERENCES

Ahmed, S., Esha, J. F., Rahman, M. S., Kaiser, M. S., Hosen, A. S. M. S., Ghimire, D., & Park, M. J. (2024). Exploring deep learning and machine learning approaches for brain hemorrhage detection. IEEE Access. https://doi.org/https://doi.org/10.1109/ACCESS.2024.3376438

Akmaljon o‘g, M. A., Abdullajon o‘g‘li, M. S., & Tolmasovich, T. R. (2024). Acute disturbance of blood circulation in the head. Western European Journal of Medicine and Medical Science, 2(4), 27–31.

Altuve, M., & Pérez, A. (2022). Intracerebral hemorrhage detection on computed tomography images using a residual neural network. Physica Medica, 99, 113–119. https://doi.org/10.1016/j.ejmp.2022.05.015

Datta, P., & Rohilla, R. (2024). An autonomous and intelligent hybrid CNN-RNN-LSTM-based approach for the detection and classification of abnormalities in the brain. Multimedia Tools and Applications, 1–27. https://doi.org/https://doi.org/10.1007/s11042-023-17877-3

Del Gaizo, A. J., Osborne, T. F., Shahoumian, T., & Sherrier, R. (2024). Deep learning to detect intracranial hemorrhage in a national teleradiology program and the impact on interpretation time. Radiology: Artificial Intelligence, 6(5), e240067. https://doi.org/https://doi.org/10.1148/ryai.240067

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 Words: Transformers for image recognition at scale. ArXiv, abs/2010.11929. https://doi.org/10.1016/j.procs.2024.04.157

Du, Y., Lang, W., Hu, X., Yu, L., Zhang, H., Zhang, L., & Wu, Y. (2024). Quality assessment of light field images based on adaptive attention in ViT. Electronics, 13(15), 2985. https://doi.org/10.3390/electronics13152985

Fei, W., Dai, W., Li, C., Zou, J., & Xiong, H. (2024). On centralization and unitization of batch normalization for deep ReLU neural networks. IEEE Transactions on Signal Processing, 72, 2827–2841. https://doi.org/10.1109/TSP.2024.3410291

Feigin, V. L., Brainin, M., Norrving, B., Martins, S., Sacco, R. L., Hacke, W., Fisher, M., Pandian, J., & Lindsay, P. (2022). World stroke organization (WSO): Global stroke fact sheet 2022. International Journal of Stroke, 17(1), 18–29. https://doi.org/https://doi.org/10.1177/17474930211065917

Feng, C., Ding, Z., Lao, Q., Zhen, T., Ruan, M., Han, J., He, L., & Shen, Q. (2023). Prediction of early hematoma expansion of spontaneous intracerebral hemorrhage based on deep learning radiomics features of noncontrast computed tomography. European Radiology, 34(5), 2908–2920. https://doi.org/10.1007/s00330-023-10410-y

Graves, A., Fernández, S., & Schmidhuber, J. (2005). Bidirectional LSTM networks for improved phoneme classification and recognition. International Conference on Artificial Neural Networks, 799–804. https://doi.org/http://dx.doi.org/10.1007/11550907_163

Grey, M. T. (2024). White matter lesions: Development, imaging, effect on brain function [Doctoral dissertation, Masaryk University, Faculty of Medicine]. Theses.cz. https://theses.cz/id/td9jv7/

Haldorai, A., Murugan, S., & Balakrishnan, M. (2024). Hemorrhage Detection from Whole-Body CT Images Using Deep Learning. In Artificial Intelligence for Sustainable Development (pp. 139–151). Cham: Springer Nature Switzerland. https://doi.org/https://doi.org/10.1007/9

He, B., Xu, Z., Zhou, D., & Zhang, L. (2024). Deep multiscale convolutional feature learning for intracranial hemorrhage classification and weakly supervised localization. Heliyon, 10(9), e30270. https://doi.org/10.1016/j.heliyon.2024.e30270

Helwan, A., El-Fakhri, G., Sasani, H., & Uzun Ozsahin, D. (2018). Deep networks in identifying CT brain hemorrhage. Journal of Intelligent & Fuzzy Systems, 35(2), 2215–2228. https://doi.org/10.3233/JIFS-172261

Hoang, Q. T., Pham, X. H., Trinh, X. T., Le, A. V., Bui, M. V., & Bui, T. T. (2024). An efficient CNN-based method for intracranial hemorrhage segmentation from computerized tomography imaging. Journal of Imaging, 10(4), 77. https://doi.org/10.3390/jimaging10040077

Hssayeni, M., Croock, M., Salman, A., Al-khafaji, H., Yahya, Z., & Ghoraani, B. (2020). Computed tomography images for intracranial hemorrhage detection and segmentation. Intracranial Hemorrhage Segmentation Using a Deep Convolutional Model. Data, 5(1), 14. https://doi.org/10.13026/w8q8-ky94

Hu, P., Yan, T., Xiao, B., Shu, H., Sheng, Y., Wu, Y., Shu, L., Lv, S., Ye, M., & Gong, Y. (2024). Deep learning-assisted detection and segmentation of intracranial hemorrhage in noncontrast computed tomography scans of acute stroke patients: A systematic review and meta-analysis. International Journal of Surgery, 110(6), 3839–3847. https://doi.org/10.1097/JS9.0000000000001266

Ibrahim, W. R., & Mahmood, M. R. (2023). Classified covid-19 by densenet121-based deep transfer learning from ct-scan images. Science Journal of University of Zakho, 11(4), 571-580. https://doi.org/10.25271/sjuoz.2023.11.4.1166

Khozama, S., & Mayya, A. M. (2022). A new range-based breast cancer prediction model using the Bayes’ theorem and ensemble learning. Information Technology and Control, 51(4), 757–770. https://doi.org/10.5755/j01.itc.51.4.31347

Kothala, L. P., & Guntur, S. R. (2024). An efficient stacked bidirectional GRU‐LSTM network for intracranial hemorrhage detection. International Journal of Imaging Systems and Technology, 34(1), e22958. https://doi.org/https://doi.org/10.1002/ima.22958

Lafraxo, S., El Ansari, M., & Koutti, L. (2024). Computer-aided system for bleeding detection in wce images based on CNN-GRU network. Multimedia Tools and Applications, 83(7), 21081–21106. https://doi.org/https://doi.org/10.1007/s11042-023-16305-w

Liu, X., & Aldrich, C. (2024). Multivariate image processing in minerals engineering with vision transformers. Minerals Engineering, 208, 108599. https://doi.org/10.1016/j.mineng.2024.108599

Majeed, M. A. A., Alrawi, A. T., & Okashi, O. M. Al. (2024). Survey on machine and deep learning methods used in CT scan brain diseases diagnosis. AIP Conference Proceedings, 3009(1). https://doi.org/https://doi.org/10.1063/5.0190368

Malik, P., Dureja, A., Dureja, A., Rathore, R. S., & Malhotra, N. (2024). Enhancing intracranial hemorrhage diagnosis through deep learning models. Procedia Computer Science, 235, 1664–1673. https://doi.org/10.1016/j.procs.2024.04.157

Murad, S. H., Awlla, A. H., & Moahmmed, B. T. (2023). Prediction lung cancer based critical factors using machine learning. Science Journal of University of Zakho, 11(3), 447–452. https://doi.org/10.25271/sjuoz.2023.11.3.1105

Neethi, A. S., Kannath, S. K., Kumar, A. A., Mathew, J., & Rajan, J. (2024). A comprehensive review and experimental comparison of deep learning methods for automated hemorrhage detection. Engineering Applications of Artificial Intelligence, 133, 108192. https://doi.org/https://doi.org/10.1016/j.engappai.2024.108192

Prasher, S., Nelson, L., & Arumugam, D. (2024). Sequential CNN model for hemorrhage prediction using brain CT images. 2024 International Conference on Advances in Modern Age Technologies for Health and Engineering Science (AMATHE), 1–4. https://doi.org/https://doi.org/10.1109/AMATHE61652.2024.10582253

Rather, M. A., Khan, A., Javed, H., Jahan, S., Tabassum, R., & Begum, R. (2024). Neuropathology of neurological disorders. In Mechanism and Genetic Susceptibility of Neurological Disorders (pp. 1–33). Singapore: Springer Nature Singapore. https://doi.org/https://doi.org/10.1007/978-981-99-9404-5_1

Rivoir, D., Funke, I., & Speidel, S. (2024). On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis. Medical Image Analysis, 94, 103126. https://doi.org/10.1016/j.media.2024.103126

Schiariti, V., Shierk, A., Stashinko, E. E., Sukal‐Moulton, T., Feldman, R. S., Aman, C., Mendoza‐Puccini, M. C., Brandenburg, J. E., & Committee, N. I. of N. D. and S. C. P. C. D. E. O. (2024). Cerebral palsy pain instruments: Recommended tools for clinical research studies by the National Institute of Neurological Disorders and Stroke Cerebral Palsy Common Data Elements project. Developmental Medicine & Child Neurology, 66(5), 610–622. https://doi.org/https://doi.org/10.1111/dmcn.15743

Sheikh, A. M., Hossain, S., & Tabassum, S. (2024). Advances in stem cell therapy for stroke: mechanisms, challenges, and future directions. Regenerative Medicine Reports, 10–4103. https://doi.org/0.4103/RMR.REGENMED-D-23-00002

Siddiq Hassan, D. (2013). The effect of feature selection methods on machine learning model performance: A comparative study for breast cancer prediction. Science Journal of University of Zakho, 13(1), 101–112. https://doi.org/10.25271/sjuoz.2024.12.3.1429

Sindhura, C., Al Fahim, M., Yalavarthy, P. K., & Gorthi, S. (2024). Fully automated sinogram‐based deep learning model for detection and classification of intracranial hemorrhage. Medical Physics, 51(3), 1944–1956. https://doi.org/https://doi.org/10.1002/mp.16714

Studer, M., & Thompson, C. R. (2024). Prevention practice for neurological conditions. In Prevention Practice and Health Promotion (pp. 241–265). Routledge. https://doi.org/https://doi.org/10.4324/9781003525882

Suryadi, B. (2024). Methods for detecting early symptoms of stroke: A literature review. Jurnal Ilmiah Ilmu Keperawatan Indonesia, 14(01), 32–43. https://doi.org/https://doi.org/10.33221/jiiki.v14i01.3165

Szabó, S., Holb, I. J., Abriha-Molnár, V. É., Szatmári, G., Singh, S. K., & Abriha, D. (2024). Classification assessment tool: A program to measure the uncertainty of classification models in terms of class-level metrics. Applied Soft Computing, 155, 111468. https://doi.org/10.1016/j.asoc.2024.111468

Tin, T. A., Aye, M. M., Khin, E. E., Oo, T., Tun, H. M., & Pradhan, D. (2024). Performance optimization of brain tumor detection and classification based MRI by using batch normalization algorithms in deep convolution neural network. Journal of Novel Engineering Science and Technology, 3(03), 66–72. https://doi.org/10.56741/jnest.v3i03.567

Zhang, R., Ding, R., Wang, Q., Zhang, L., Fan, X., Guo, F., Chen, X., Jiang, C., Cao, J., & Wang, J. (2024). Inflammation in intracerebral hemorrhage: A bibliometric perspective. Brain Hemorrhages, 5(3), 107–116. https://doi.org/https://doi.org/10.1016/j.hest.2024.01.003

* Corresponding author

This is an open access under a CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/)

Parameter	Description
Image Size	2242243
Batch size	64
Optimizer	Adam/ RMSProp/ NADAM/ ADAMX
Learning Rate	1e-4
Loss function	Categorical cross-entropy
Metrics	Accuracy
Epochs	50
Early stop condition	Patience=5
Save best only	Yes
Resources	Training resources: The model was trained using the COLAB environment [NVIDIA Tesla T4 GPU (16 GB VRAM) and 12 GB RAM] Required implementation resources: CPU with 4 to 8 GB of RAM (no need for GPU) since the test or evaluation will be on a single image.

	Precision (%)	Recall (%)	F1-score (%)	Support
Hemorrhagic	99.62	99.62	99.62	267
NORMAL	99.75	99.75	99.75	411
Macro avg	99.685	99.685	99.685	678
Weighted avg	99.698	99.698	99.698	678


(a)	(b)

(c)	(d)

	Training Accuracy %	Validation Accuracy %	Test Accuracy %	Test Precision	Test Recall	Test F1-score	Training Time (s/epoch)	AUC
Without Balance	78.15	88.94	86.25	81	61	64	35.16	0.84
With Balance	95.47	97.44	96.94	97	97	97	58.73	0.99

Model	Precision (%)	Recall (%)	F1-score (%)	Accuracy (%)
ViTBiLSTM (ADAM)	99.685	99.685	99.685	99.7
ViTBiLSTM (RMSProp)	100	100	100	100
ViTBiLSTM (NADAM)	99	99	99	98.96
ViTBiLSTM (ADAMX)	91	89	90	90.56

Name	Num. of subjects	Num. of images	Class distribution	Source
Brain CT Hemorrhage Dataset	45 subjects	6772	4105 normal and 2667 hemorrhagic	(Helwan et al., 2018)
Brain CT Images with Intracranial Hemorrhage Masks	82 subjects	2500	2182 normal, and 318 hemorrhagic	(Hssayeni et al., 2020)

	Training Accuracy %	Training Loss	Validation Accuracy %	Validation Loss	Test Accuracy %	Test Loss	Training Time (s/epoch)	AUC (Test)
ViTBiLSTM (ADAM)	98.19	0.2468	99.85	0.1923	99.7	0.19	99.85	1.0
ViTBiLSTM (RMSProp)	98.95	0.2208	99.85	0.1799	100	0.169	98.22	1.0
ViTBiLSTM (NADAM)	96.2	0.3413	99.7	0.3111	98.96	0.313	108.33	1.0
ViTBiLSTM (ADAMX)	87.39	0.3451	93.06	0.2232	90.56	0.257	103.96	0.97


(a)	(b)