Cancer

BUILDING A DEEP LEARNING MODEL TO DETECT IDC CANCER IN IMAGES | by Emmanueloffisong | Oct, 2021

Deep learning is a broader family of machine learning which imitates the way humans grasp certain kind of knowledge. It got its ideology from the human brain. Deep learning uses artificial neurons to imitate how the human brain works in the aspect of learning.

Deep learning is applied in so many areas such as

  1. Self driving cars
  2. Medical imagery
  3. Object detection
  4. Google Translators
  5. Facial Recognition
  6. Robotics
  7. So many others

IDC is the short form of Invasive Ductal Carcinoma. It is the most common form of breast cancer that mostly occurs in women. It occurs rarely in men. It is a type of tumor which is present in about eighty percent of people with breast cancer. This type of cancer is easily treated if diagnosed early.

  1. We must have a computer or a laptop
  2. Install Anaconda: Anaconda is an open sourced distribution of programming languages which allows us to perform data science and machine learning on our laptops. Anaconda comes with a lot of built in features for data science and machine learning. It also comes with programming languages such as R and Python. Anaconda can be installed with this link : https://www.anaconda.com/products/individual

We will use an open sourced dataset from Kaggle for this tutorial. The dataset can easily be downloaded from this link : https://www.kaggle.com/simjeg/lymphoma-subtype-classification-fl-vs-cll

The dataset is a Numpy array of images. It has already been split for us into its features and its labels. The dataset is divided into X.npy which contains all the images with and without cancer and Y.npy which contains the labels. For the labels, 0 means no cancer and 1 means cancer.

importing the necessary libraries

Here, we import the necessary libraries we need for this tutorial. All these libraries are already prebuilt in Anaconda. Here, we import Numpy for our numerical computations. We import Matplotlib and Seaborn for visualization. We import Tensorflow for our deep learning.

Now, let us load our data and check the shape or the dimensions of our data. We can do this by simply saying X.shape or y.shape

We see here that our X (our images ) data is of shape (5547,50,50,3). What is this????

The first part of it i.e. (5547) is the batch size of the image. This means how many images are present in the Numpy array.

The second and third part i.e. (50,50) is the dimensions of the image. This part means that our image is 50 pixels in length and 50 pixels in height.

The last part i.e. (3) means that our image has three colors (R.G.B) which are red, green and blue.

Our y data(labels) is of shape 5547. This means that we have 5547 labels consisting of one’s and zero’s. One, if the image has cancer and Zero, if cancer is not present in the image.

We all love pictures, don’t we?

Of course we all do. This is one reason why visualizing our data is important. We need to see our data in a clearer and more precise and understandable way.

For this, we created a plotting function with Python which randomly plots the images contained in our X(images) data. Recall that the number of images in our X data was between 1 and 5547. So, this plotting function randomly plots three images between 1 and 5547.


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button