Abstract
Research has proven that images with blur, low contrast or other deficiencies are detrimental to accuracy of classification models. Working with a dataset of blurry, hard-to-see images creates an uncertainty in whether the machine learning model is able to accurately identify objects. These hard to identify images require the use of metadata on the image to be able to truly understand and classify the image accuracy within a model. To overcome the necessity of metadata for classification of hard to identify images, we propose a method of identifying the unfit images for purposes of cleaning the dataset prior to neural network training. In our work, we use Sobel and Scharr operators-based edge detectors to produce an image with detected boundaries among the elements of the original X-Ray. The resulting images are used to train a shallow CNN to classify the image as clear or lacking in quality. Our proposed method identifies the clear, quality images with 95% accuracy and can be utilized in future disease classification models. We define the proposed approach as image preprocessing aiming to weed out infit, blurry, low/high intensity X-Rays and create a dataset suitable for disease identification. (C) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://crativecommons.org/licenses/by-nc-nd/4.0) Peer-review under responsibility of the scientific committee of KES International.