Counting cancer cells with computer vision for time-lapse microscopy
April 21, 2016 byArtem Kaznatcheev 7 Comments
Some people characterize TheEGG as a computer science blog . And although (theoretical) computer science almost always informs my thought, I feel like it has been a while since I have directly dealt with the programming aspects of computer science here. Today, I want to remedy that. In the process, I will share some Python code and discuss some new empirical data collected by Jeff Peacock and Andriy Marusyk. 
Together with David Basanta and Jacob Scott, the five of us are looking at the in vitro dynamics of resistance to Alectinib in non-small cell lung cancer. Alectinib is a new ALK-inhibitor developed by the Chugai Pharmaceutical Co. that was approved for clinical use in Japan in 2014, and in the USA at the end of 2015. Currently, it is intended for tough lung cancer cases that have failed to respond to crizotinib . Although we are primarily interested in how alectinib resistance develops and unfolds, we realize the importance of the tumour’s microenvironment, so one of our first goals — and the focus here — is to see how the Alectinib sensitive cancer cells interact with healthy fibroblasts. Since I’ve been wanting to learn basic computer vision skills and refresh my long lapsed Python knowledge, I decided to hack together some cell counting algorithms to analyze our microscopy data. 
In this post, I want to discuss some of our preliminary work although due to length constraints there won’t be any results of interest to clinical oncologist in this entry. Instead, I will introduce automated microscopy to computer science readers, so that they know another domain where their programming skills can come in useful; and discuss some basic computer vision so that non-computational biologists know how (some of) their cell counters (might) work on the inside. Thus, the post will be methods heavy and part tutorial, part background, with a tiny sprinkle of experimental images.  I am also eager for some feedback and tips from readers that are more familiar than I am with these methods. So, dear reader, leave your insights in the comments.
A few years ago, I wrote about droids taking academic jobs from a typical theorist’s focus on automating literature review and writing. However, the real automation breakthroughs are on the experimental side. There used to be a time when people would have to look through microscopes by eye, count cells manually, and sketch what they see. I think I even did that at some point in school. Now, Jeff can prepare a 96-well plate and put it in a machine that will take precise photos of the same exact positions in each well, every 2 hours.  Six days later, I can ssh into the same machine and get about 51 thousand images — 40 GB of TIFs — to analyze.
However, even under hundredfold magnification, cells are hard to see and tell apart. So a number of physical and biological tricks are used before the images are even recorded. The first of these is phase-contrast. In my grade-school experience with microscopes, I had a light source under the sample, it would shine through it, with some of the light being absorbed (or otherwise obstructed) by the cell, and thus creating darker patches for me to identify as life. This is called bright-field microscopy , and only takes advantage of the amplitude of the light — that which makes things brighter or darker to our eyes and cameras. But light carries a second channel: phase. Phase is shifted by passing through the different media of the cell membrane and other components, with the amount of shift depending on thickness and refractive index of the object. This information is then captured by the digital camera, combined with the standard bright field information, and converted into an optical thickness for each pixel. The result is a phase-contrast image that is much clearer than standard bright-field and that allows us to see details that would normally only be visible through staining of dead cells.
But even if you can see the cells, it is still often impossible to tell them apart because different cell types can be morphologically identical. To overcome this, a biological engineering trick can be combined with fluorescence imaging. We can take our separate cell lines — in our case: parental cancer cells, resistant cancer cells, and fibroblasts — and add an extra piece of DNA to code for a fluorescing (but otherwise non-functional) protein. Then, during image capture, we shine a light of the excitatory wavelength that is absorbed by the protein and then re-emitted at a longer wavelength which is recorded (after the source wavelength is physically filtered out) by the digital camera. Thus, instead of looking at obstructions (and phase-shift) of an external light source, we end up using the protein itself as the light source and recording where that light is brightest. By giving different cell lines different proteins that are excited by and emit different wavelengths, we can then tell those cells apart by the wavelength of their fluorescence.
The practical result is that for each snapshot of part of a well,  we end up with three photos: one phase-contrast, and one color channel for each of the two fluorescent proteins. The programming problem becomes transforming these 3 photos into a measurement of size for the populations of cancer cells and fibroblasts.
An example of the 3 channels that are recorded from microscopy. From left to right: phase contrast, GFP, mCherry. All data is intensity, but I used the green and red colour space for the 2nd and 3rd image to make them easier to distinguish at a glance.
In our experimental system, we created two variants of each of our cancer and fibroblast cell lines by inserting the genes coding for GFP (absorption 395 nm , emission 509 nm ) into one and mCherry ( 587/610 nm ) into the other. These genes are inserted throughout the genome, and so we expect them to be expressed and thus to fluoresce whenever proteins are being transcribed. Further, our fluoroproteins are not confined to any specific part of the cell and distribute themselves haphazardly throughout the cell. This allows the fluorescent area to serve as a good unit of size for our populations .
Conveniently — and not at all by coincidence — fluorescent area is much easier to measure than the number of cells. Mostly, this is due to the tendency of the cancer cells to clump, making segmentation difficult, and the lanky and faint fibroblasts making it hard to find their boundaries.  Of course, this means that as we eventually transform this into replicator dynamics, we will have bemindful of our units being fluorescent area and not cell count, but that choice of units makes no difference to the mathematics.
Basic image processing
“If you want to make an apple pie from scratch,” said Carl Sagan in Cosmos (1980), “you must first invent the universe.” This is equally true for programming. If I want to build my analysis code ‘from scratch’ then I can go arbitrarily far back to progressively more basic building blocks. At some point, if I don’t want to ‘invent the universe’, I have to pick a set of packages to start from. I decided to start from the Python equivalent of store-bought crust and canned apple pie filling: the OpenCV package (and numpy, of course). 
import numpy as np import cv2
OpenCV makes file operations with images extremely simple:
def AreaCount(col_let, row_num, fot_num, dirName = '', ret_img = True): #load the images head = dirName + col_let + str(row_num) + '-' + str(fot_num) rdImage = cv2.imread(head + '-C1.tif',cv2.IMREAD_UNCHANGED) gnImage = cv2.imread(head + '-C2.tif',cv2.IMREAD_UNCHANGED) if ret_img: grImage = cv2.imread(head + '-P.tif',cv2.IMREAD_UNCHANGED)
In this case, the outputs of the cv2.imread functions are simply two-dimensional arrays of intensities. Although the microscope itself creates different kinds of TIFs for the phase-contrast and fluorescent photos, so the output arrays require a conversion into a common 8-bit format: 
#switch to 8bit with proper normalizing if np.amax(rdImage) > 255: rdImage8 = cv2.convertScaleAbs(rdImage, alpha = (255.0/np.amax(rdImage))) else: rdImage8 = cv2.convertScaleAbs(rdImage) if np.amax(gnImage) > 255: gnImage8 = cv2.convertScaleAbs(gnImage, alpha = (255.0/np.amax(gnImage))) else: gnImage8 = cv2.convertScaleAbs(gnImage)
To go beyond file operations, thought, I need a brief detour into the real apples of OpenCV.
Straight out of the box, OpenCV gives me tools like histogram equalization for enhancing image contrast. Histogram equalization for contrast works on the premise that what makes an image high contrast is a maximization of information conveyed by the intensity of each pixel. In other words, we want to stretch out the intensity histogram into the highest-entropy version: a uniform distribution. This is a relatively straightforward process, just take the cumulative distribution for the pixels, and then create a function such that a pixel of value with maps to a new value , that way in the new image . Not a hard function to implement for yourself, but with OpenCV it is just one line:
img_hicon = cv2.equalizeHist(img) .
An example of histogram equalization that makes obvious the colour gradient present in the mCherry channel. On the right of each image is the histogram for their intensity distribution, with the raw histogram in red and the cumulative distribution as a black line. For the equalized image, I am using a different color map with blue corresponding to low intensity and red to high intensity. In the bottom histogram, I am also including bigger bins to make it clearer how the distribution is moved towards a uniform one.
What you might notice right away is the colour gradient radiating outwards. This is not an issue in the phase-contrast image, but the fluorescence images have a glow around their outside, like reverse vignetting ,  that is not apparent to the eye until you turn up the contrast (or try to threshold).
With this problem and tool in mind, let me walk through the simple function
FluorescentAreaMark that sits at the base of my current image processing. To get rid of the above glow I use a variant of the method we used to notice it: contrast limited adaptive histogram equalization.
def FluorescentAreaMark(img): #1: contrast limited adaptive histogram equalization to get rid of glow clahe = cv2.createCLAHE(clipLimit=50.0, tileGridSize=(20,20)) clahe_img = clahe.apply(img)
mCherry channel after processing with adaptive histogram equalization. On the left is the image with blue corresponding to low intensity and red to high intensity, and on the right is the histogram of intensities with the cumulative distribution in black.
The algorithm works by dividing the image into 400 tiles (20 tiles by 20 tiles), and normalizing the contrast in each one. Each pixel is then given a value by bilinear interpolation between the 4 tiles nearest it. This gives a sharp distinction of the cells from their background, and we can then detect them with a simple threshold, followed by some noise canceling  :
#2: threshold and clean out salt-pepper noise ret, thresh_img = cv2.threshold(clahe_img,127,255, cv2.THRESH_BINARY) kernel = np.ones((3,3),np.uint8) clean_img = cv2.morphologyEx(thresh_img, cv2.MORPH_OPEN, kernel, iterations = 2) return clean_img
The threshold in line 11 simply converts the 8bit image to a binary one, by setting all pixels with intensity lower than 127 in the original image to 0 in the final image and those with intensity of 127 and higher to 1. The noise filtering step in line 15 is morphological opening — a fancy name but a straightforward algorithm. Its two basic components are erosion and dilation , both require a kernel that defines a local neighbourhood. In our case the kernel in line 13 is the one-distance Moore neighbourhood . Erosion is an AND operation on the binary mask: for a focal pixel, if every pixel in its local neighbourhood is 1 in the original image than that pixel is a 1 in the final image, else it is zero. Dilation is an OR operation: if at least one pixel in the local neighbourhood of the focal pixel is 1 in the original image then the focal pixel is a 1 in the final image. Morphological opening cycles erosion followed by dilation a number of times given by iterations — two in our case. Thus, lonely pixels that are most likely due to noise are eliminated, but the overall area of the non-noise pixels is not decreased.
The returned clean_img on line 17 is then a binary two dimensional array with a ‘1’ for each fluorescent pixel and ‘0’ otherwise. If we want the totalfluorescent area then we just sum this array, outside a small buffer area around the edges which tends to contain more mistakes in correcting the edge glow. If desired, we can also combine the raw phase-contrasts images with the two processed fluorescent areas for a final visualization of the distribution of fluorescence throughout the spatially structured population:
#get the area masks rdFA = FluorescentAreaMark(rdImage8) gnFA = FluorescentAreaMark(gnImage8) ign_buf = 30 #how big of an edge do we ignore? rd_area = (rdFA[ign_buf:-ign_buf,ign_buf:-ign_buf] > 0) gn_area = (gnFA[ign_buf:-ign_buf,ign_buf:-ign_buf] > 0) #create image to save img_out =  if ret_img: bW = 0.85 img_out = cv2.merge(( cv2.addWeighted(grImage,bW,rdFA,1- bW,1), cv2.addWeighted(grImage,bW,gnFA,1 - bW,1), cv2.addWeighted(grImage,bW,np.zeros_like(grImage), 1 - bW, 1))) return [np.sum(rd_area),np.sum(gn_area)], img_out
Distribution of fluorescent area. Non-small cell lung cancer cells are marked with mCherry in red, and non-cancerous fibroblasts are marked GFP in green. Colours in this image are a binary mask after the processing from this post.
In the near future, I plan to use this sort of spatial distribution information to learn about operationalizing the local environment . For the next post, however, I will just look at the overall population dynamics estimated by this method and what we can learn from them. Although the basics of computer vision that I hacked together in this post are sufficient for our purposes with this project, it does violate my ambitions as a computer scientist. If there is time and interest then I will also write some future posts to refine this code to minimize categorization errors and introduce some proper cell segmentation.
- I was also inspired by Jan Poleszczuk’s Compute Cancer blog to start posting actual code snippets to allow for easier collaboration and engagement with programmers interested in biology, or biologists looking to start programming. I recommend checking out his blog for many code snippets and tricks across various languages. For example, if you want to oppose my choice of Python in this post then check out his speed comparison of Java, C++, MATLAB, Julia or Python for cellular automatons post for ammunition. Also, let me know what you think of interspersing code within this post. If there is a lot of interest in this format then I will try to include it in more future posts.
- I am, of course, reinventing the wheel. A lot of great packages and programs already exist for processing and analyzing microscopy data, and we even have some in-house code here at the IMO. However, I wanted to go through the motions of this as a learning exercise. This is to make sure that I understood exactly what the algorithm is doing, and to make sure that it is a good fit for the particulars of our experimental set up.
- Having several target audiences means that the first section on microscopy is probably too basic for biologists, and the second section on computer vision is probably too basic for programmers. So feel free to jump around. As with many of my posts, this is also meant as a tutorial for future Artem, for when I return to this experiment and code after a few months and have no idea what it does, why it exists, and how to get it running. I will try to confine the more pedantic parts of this documentation to footnotes.
- In fact, as automation continues its perpetual march forward, even the preparation step is being eliminated. Labs like the Foundry at Imperial College London are automating the whole bench biology pipeline . I wonder if this will soon allow us to program in vitro experiments from basic components and then run them in remote automated labs, much like how we currently send in silico simulations to clusters. My optimistic hope is that the communication with machines forced by automation will also encourage more and more experimental biology graduate students to learn more and more computer science. Maybe that exposure to CS will allow for a flourishing ofalgorithmic biology. My cynical side, however, wonders at the disconnect between programming — a skill that everyone, from any field, should pick up — and the more niche aspects of algorithmic thinking or theoretical computer science.
- At our magnification in this post, the field of view for a single photo is about 1/5th of the diameter of the well. The microscope takes photos of three different positions — the same positions at every 2-hour time step — in each well.
- These limitations of not segmenting single cells can be overcome with more work. I will have to return to them in future blogposts if I want to look at single-cell tracking and lineage tracing.
- Python has packages for almost anything, and usually in many versions, so there are plenty of alternatives to OpenCV. I chose it due to its great documentation, and from chatting with Chandler Gatenbee — the IMO expert on computer vision — who also builds his code with this package. Chandler’s warning was that the hardest part of OpenCV is getting it installed.
I use the Anaconda distribution of Python 3, and it is pretty easy to get it with the built-in package manager from binstar:
conda install -c https://conda.binstar.org/menpo opencv. It is also a good idea to have a separate environment for opencv to work in, to avoid breaking dependencies. This can be done with
conda create -n opencv numpy scipy scikit-learn matplotlib python=3and then accessed later from command line (before running
jyputer notebook, for example) with
source activate opencv. For a slightly more detailed tutorial, see this .
The code I discuss in this post is available on my GitHub , and the numbered lines of code correspond to the numbers in the first commit
FluorescentArea.py. The images through out this post were generated from the jyputer notebook in which I hacked the code together and paint. It is straightforward to generate these images with OpenCV, but if you want the details then ask me in the comments and I will post some more code snippets.
- The function cv2.converScaleAbs tends to clip-down the intensities, setting every pixel above 255 to 255 , so if there are intensities above 255 in the 16-bit images, it is better to first renormalize (as is done in lines 30,35 by setting
alphato something other than 1 ) to make the brightest pixel 255 . Note that this renormalization means that I cannot make physical sense of the raw intensity numbers and have to rely on contrast. I could use a consistent renormalization to allow using raw intensity numbers, but since the rest of the processing focuses on contrast, there is no reason to worry about it.
- We are not sure what physical process causes this glow. The apparent circular symmetry makes me think it has something to do with the lens. Andriy suspects it has more to do with partial fluorescence from the well walls, and to remedy this, we are using a different type of 96 well plate in follow up experiments.