Many medical image processing tasks focus mainly on finding a pathology. For example, during CT cancer screening we mostly identify nodules on an image. But our models are not perfect (like everything in the world), and we occasionally find lung nodules in the intestine or even outside the body’s boundaries. For this reason, it’s significant to have a mask of the target organ explicitly to be able to subtract false detected regions outside.
Like many popular computer vision solutions, LungSeg uses neural networks inside. The module’s heart is an Unet trained on about 1300 CT series (~ 290k slices) from different datasets, including NSCLC and LIDC-IDRI. For some parts of the data, manual segmentation ground truth masks were available. Some masks were generated by an intermediate lung segmentation model, trained on part of data with GT. In such cases known pathology masks (if available) were added to lungs masks in order to increase recall of the resulting organ’s mask. Some data in the dataset contains only 1 class for both lungs. These cases were split into 2 parts (to left and right) using a watershed, and it worked nearly perfectly.
Our model produces masks for each lung separately. This approach requires the model to detect a body orientation (at least left-right direction). However, in many cases, lungs look very symmetrical on the axial projection, so it’s impossible even for a human to identify it correctly. We tried to train such a model, but it got many artifacts of pixel-wise misclassification. Instead, we have trained an additional body orientation model which takes 10 slices from the series and predicts the correct orientation with very high accuracy (kind of 0.999). You may argue that the orientation information is stored in DICOM tags, but our practice whispers us not to rely on DICOM tags.
To get all LungSeg benefits, we recommend making a prediction for the whole CT series regardless of body orientation at your images. Also one can use the slice-by-slice prediction without a body orientation part of the pipeline. Let’s consider how to do this.
Every usage starts with installation. Here we are:
pip install git+https://github.com/Botkin-AI/lungseg.git
It requires PyTorch, PyDicom, scikit-image and NumPy as dependencies.
It is pretty simple to use LungSeg:
from lungseg.dcm_reader import SeriesLoader
dicom = SeriesLoader(folder_to_dicoms=path_dcm)
segmentor = lungseg.LungSegmentor()
lung_mask = segmentor.predict(dicom.slices,batch_size=your_batch_size)
That’s it! Lung mask is ready for your needs! You are perfect!
In case you want to make a prediction slice-by-slice, you can use this part of the code:
segmentor = lungseg.LungSegmentor()
lung_mask = segmentor.predict_one_slice(one_slice) # expect 2d-array
Some additional useful tips:
- It’s important to feed the model with data in Hounsfield units (it’s just a radiological density scale). Very important. It’s boldly highlighted to underline it!
- The expected shapes of slices are [H x W x n_slices]. A slice dimension is in the last position.
- In a predicted mask, left lung is labeled with 1, right lung’s label is 2.
- If you want to read CT data with your own functions, please note to feed the model with array as np.array
It’s the most interesting part of the article. What’s the problem to segment dark (less dense) lungs surrounded by a white (more dense) tissue? For most cases — not at all. For most cases of norma. Problems arise when one deals with abnormal lungs. These abnormalities include pleural effusion, consolidation, atelectasis, fibrosis, cancer, etc. To be short, in the most important cases when we process chest CT images.
If our pipeline subtracts non-lung regions using a generated lung mask, we want to be sure that the unhealthy lung tissue is still here. In other words, it is better to have a high sensitivity.
There are not so many available open-source lung segmentation data. However, we have the data of thoracic regions from NSCLC and we chose this for the benchmark. To be correct, our model was trained on this dataset, but the benchmark was made on a hold-out part. By comparison, we used another open-source library for lung segmentation — lungmask.
The whole dataset NSCLC contains a lot of difficult cases with various pathologies. Their markup protocol includes central cancer, pleural effusion, and some bronchi region as a thoracic region. It is a dense tissue, which could be hard to segment by selecting a dark lung region. Our benchmark data represents 58 series from this dataset.
These are metrics of a different model from the benchmark. The dice metrics are calculated for the whole volume, the sensitivity is also volumetric. Label 1 relates to the left lung, label 2 — to the right lung.
One can see that LungSeg’s metrics are higher. With a similar dice coefficient, LungSeg is more sensitive, but before making any conclusion, we should take a look at the images.
Here we can see that the large cancer tissue isn’t included in a mask generated by lungmask while LungSeg selects it correctly. Also, the bronchi region was segmented by LungSeg and GT, but were not segmented by lungmask. Both of these features contribute to increased sensitivity. In most cases, it is more preferable to segment cancer tissues as a part of a lungs and ignore the bronchi region. In these terms, we recommend choosing the solution, which is closer to your task.
The figure below demonstrates that LungSeg out-of-box provides robustness to body orientation. It correctly recognizes left and right lungs thanks to a special part of the processing pipeline. It covers such cases when a patient was in the prone position (but the image below was just flipped during the inference).
With regard to the computational time for the benchmark dataset, model inference takes 447s for LungSeg and 229s for lungmask. In other words, lungmask works about 2 times faster. The main reason for it is that lungmask inference network on images with a size of 256*256 pixels, but LungSeg does it with 384*384.
Botkin.AI has released the new automated lung segmentation module for CT scans called LungSeg. It is robust to high-density tissue in the lungs and to body orientation. Mostly it inherits a markup protocol from the NSCLC dataset. Enjoy it in your research and pet projects!