LungSeg: open-source lung segmentation on CT scans | by Roman Matantsev | Botkin.AI | Nov, 2021

Roman Matantsev

Many medical image processing tasks focus mainly on finding a pathology. For example, during CT cancer screening we mostly identify nodules on an image. But our models are not perfect (like everything in the world), and we occasionally find lung nodules in the intestine or even outside the body’s boundaries. For this reason, it’s significant to have a mask of the target organ explicitly to be able to subtract false detected regions outside.

Today the Botkin.AI team is happy to present an open-source solution for lung segmentation on CT images, which is called LungSeg. This article describes our approach, dataset usage, and main results.

Our model produces masks for each lung separately. This approach requires the model to detect a body orientation (at least left-right direction). However, in many cases, lungs look very symmetrical on the axial projection, so it’s impossible even for a human to identify it correctly. We tried to train such a model, but it got many artifacts of pixel-wise misclassification. Instead, we have trained an additional body orientation model which takes 10 slices from the series and predicts the correct orientation with very high accuracy (kind of 0.999). You may argue that the orientation information is stored in DICOM tags, but our practice whispers us not to rely on DICOM tags.

To get all LungSeg benefits, we recommend making a prediction for the whole CT series regardless of body orientation at your images. Also one can use the slice-by-slice prediction without a body orientation part of the pipeline. Let’s consider how to do this.

pip install git+

It requires PyTorch, PyDicom, scikit-image and NumPy as dependencies.

It is pretty simple to use LungSeg:

import lungseg
from lungseg.dcm_reader import SeriesLoader

dicom = SeriesLoader(folder_to_dicoms=path_dcm)
segmentor = lungseg.LungSegmentor()
lung_mask = segmentor.predict(dicom.slices,batch_size=your_batch_size)

That’s it! Lung mask is ready for your needs! You are perfect!

In case you want to make a prediction slice-by-slice, you can use this part of the code:

segmentor = lungseg.LungSegmentor() 
lung_mask = segmentor.predict_one_slice(one_slice) # expect 2d-array

Some additional useful tips:

  1. It’s important to feed the model with data in Hounsfield units (it’s just a radiological density scale). Very important. It’s boldly highlighted to underline it!
  2. The expected shapes of slices are [H x W x n_slices]. A slice dimension is in the last position.
  3. In a predicted mask, left lung is labeled with 1, right lung’s label is 2.
  4. If you want to read CT data with your own functions, please note to feed the model with array as np.array

If our pipeline subtracts non-lung regions using a generated lung mask, we want to be sure that the unhealthy lung tissue is still here. In other words, it is better to have a high sensitivity.

There are not so many available open-source lung segmentation data. However, we have the data of thoracic regions from NSCLC and we chose this for the benchmark. To be correct, our model was trained on this dataset, but the benchmark was made on a hold-out part. By comparison, we used another open-source library for lung segmentation — lungmask.

The whole dataset NSCLC contains a lot of difficult cases with various pathologies. Their markup protocol includes central cancer, pleural effusion, and some bronchi region as a thoracic region. It is a dense tissue, which could be hard to segment by selecting a dark lung region. Our benchmark data represents 58 series from this dataset.

These are metrics of a different model from the benchmark. The dice metrics are calculated for the whole volume, the sensitivity is also volumetric. Label 1 relates to the left lung, label 2 — to the right lung.

One can see that LungSeg’s metrics are higher. With a similar dice coefficient, LungSeg is more sensitive, but before making any conclusion, we should take a look at the images.

Here we can see that the large cancer tissue isn’t included in a mask generated by lungmask while LungSeg selects it correctly. Also, the bronchi region was segmented by LungSeg and GT, but were not segmented by lungmask. Both of these features contribute to increased sensitivity. In most cases, it is more preferable to segment cancer tissues as a part of a lungs and ignore the bronchi region. In these terms, we recommend choosing the solution, which is closer to your task.

The figure below demonstrates that LungSeg out-of-box provides robustness to body orientation. It correctly recognizes left and right lungs thanks to a special part of the processing pipeline. It covers such cases when a patient was in the prone position (but the image below was just flipped during the inference).

With regard to the computational time for the benchmark dataset, model inference takes 447s for LungSeg and 229s for lungmask. In other words, lungmask works about 2 times faster. The main reason for it is that lungmask inference network on images with a size of 256*256 pixels, but LungSeg does it with 384*384.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button