OS X Screen capture from Python/PyObjC

Posted on Oct. 23, 2012 by Ben Dickson.

This is an old post from 2012. The content may be outdated or no longer relevant.

This is an old post I wrote, which was originally posted on neverfear.org on Tues, 23rd Oct 2012 17:49:54. Archived here for posterity.

Looking through the unanswered Python questions on StackOverflow (archive.org link) , I found one that seemed interesting.. "Python Get Screen Pixel Value in OS X" (archive.org link)

how to access screen pixel values, without the overhead of calling the screencapture command, then loading the resulting image.

After a bit of searching, the best supported way of grabbing a screenshot is provided by the CoreGraphics API, part of Quartz (archive.org link)) , specifically CGWindowListCreateImage .

Since CoreGraphics is a C-based API, the code map almost directly to Python function calls. It's also simplified a bit, because PyObjC handles most of the memory-management (when the wrapping Python object goes out of scope, the underlying object is freed)

Getting the image

After finding some sample iOS code with sane arguments (which can also be found via Apple's docs), I ended up with a CGImage containing the screenshot:

>>> import Quartz.CoreGraphics as CG
>>> image = CG.CGWindowListCreateImage(CG.CGRectInfinite, CG.kCGWindowListOptionOnScreenOnly, CG.kCGNullWindowID, CG.kCGWindowImageDefault)
>>> print image
<CGImage 0x106b8eff0>

Hurray. We can get the width/height of the image with help from this SO question (archive.org link) :

>>> width = CG.CGImageGetWidth(image)
>>> height = CG.CGImageGetHeight(image)

Extracting pixel values

Then it was a case of working out how to extract the pixel, which took far longer than all of the above. The simplest way I found of doing this is:

Use CGImageGetDataProvider (archive.org link) to get an intermediate representation of the data
Pass the DataProvider to CGDataProviderCopyData . In Python this returns a string, which is really a byte-array containing 8-bit unsigned chars, suitable for unpacking with the handy struct (archive.org link) module
Calculate the correct offset for a given (x,y) coordinate as described here (archive.org link)

Like so:

>>> prov = CG.CGImageGetDataProvider(image)
>>> data = CG.CGDataProviderCopyData(prov)
>>> print prov
<CGDataProvider 0x7fc19b1022f0>
>>> print type(data)
<objective-c class __NSCFData at 0x7fff78073cf8>

..and calculate the offset

>>> x, y = 100, 200 # pixel coordinate to get value for
>>> offset = 4 * ((width*int(round(y))) + int(round(x)))
>>> print offset
1344400

Finally, we can unpack the pixels at that offset with `struct.unpack_from` (archive.org link)

B is an unsigned char:

>>> b, g, r, a = struct.unpack_from("BBBB", data, offset=offset)
>>> print (r, g, b, a)
(23, 23, 23, 255)

Note that the values are stores as BGRA (not RGBA).

Verification, and code

To verify this wasn't generating nonsense values, I used the nice and simple pngcanvas (archive.org link) to write the screenshot to a PNG file (pngcanvas is a useful module because it's pure-Python, and a single self-contained .py file - much lighter weight than something like the PIL, good for when you just want to write pixels to an image-file)

The performance was definitely better than the screencapture solution. The screencapture command took about 80ms to write a TIFF file, then there would be additional time to open and parse the TIFF file in Python. The PyObjC code takes about 70ms to take the screenshot and have the values accessible to Python.

Finally, the result - best to view the code on my StackOverflow answer (archive.org link) (as there might be other better answers, or edits to the code)

I'll include the code here too, for completeness sake:

import time
import struct
 
import Quartz.CoreGraphics as CG
 
 
class ScreenPixel(object):
    """Captures the screen using CoreGraphics, and provides access to
    the pixel values.
    """
 
    def capture(self, region = None):
        """region should be a CGRect, something like:
 
        >>> import Quartz.CoreGraphics as CG
        >>> region = CG.CGRectMake(0, 0, 100, 100)
        >>> sp = ScreenPixel()
        >>> sp.capture(region=region)
 
        The default region is CG.CGRectInfinite (captures the full screen)
        """
 
        if region is None:
            region = CG.CGRectInfinite
        else:
            # TODO: Odd widths cause the image to warp. This is likely
            # caused by offset calculation in ScreenPixel.pixel, and
            # could could modified to allow odd-widths
            if region.size.width % 2 > 0:
                emsg = "Capture region width should be even (was %s)" % (
                    region.size.width)
                raise ValueError(emsg)
 
        # Create screenshot as CGImage
        image = CG.CGWindowListCreateImage(
            region,
            CG.kCGWindowListOptionOnScreenOnly,
            CG.kCGNullWindowID,
            CG.kCGWindowImageDefault)
 
        # Intermediate step, get pixel data as CGDataProvider
        prov = CG.CGImageGetDataProvider(image)
 
        # Copy data out of CGDataProvider, becomes string of bytes
        self._data = CG.CGDataProviderCopyData(prov)
 
        # Get width/height of image
        self.width = CG.CGImageGetWidth(image)
        self.height = CG.CGImageGetHeight(image)
 
    def pixel(self, x, y):
        """Get pixel value at given (x,y) screen coordinates
 
        Must call capture first.
        """
 
        # Pixel data is unsigned char (8bit unsigned integer),
        # and there are for (blue,green,red,alpha)
        data_format = "BBBB"
 
        # Calculate offset, based on
        # http://www.markj.net/iphone-uiimage-pixel-color/
        offset = 4 * ((self.width*int(round(y))) + int(round(x)))
 
        # Unpack data from string into Python'y integers
        b, g, r, a = struct.unpack_from(data_format, self._data, offset=offset)
 
        # Return BGRA as RGBA
        return (r, g, b, a)
 
 
if __name__ == '__main__':
    # Timer helper-function
    import contextlib
 
    @contextlib.contextmanager
    def timer(msg):
        start = time.time()
        yield
        end = time.time()
        print "%s: %.02fms" % (msg, (end-start)*1000)
 
 
    # Example usage
    sp = ScreenPixel()
 
    with timer("Capture"):
        # Take screenshot (takes about 70ms for me)
        sp.capture()
 
    with timer("Query"):
        # Get pixel value (takes about 0.01ms)
        print sp.width, sp.height
        print sp.pixel(0, 0)
 
 
    # To verify screen-cap code is correct, save all pixels to PNG,
    # using http://the.taoofmac.com/space/projects/PNGCanvas
 
    from pngcanvas import PNGCanvas
    c = PNGCanvas(sp.width, sp.height)
    for x in range(sp.width):
        for y in range(sp.height):
            c.point(x, y, color = sp.pixel(x, y))
 
    with open("test.png", "wb") as f:
        f.write(c.dump())

Getting the image

Extracting pixel values

Finally, we can unpack the pixels at that offset with struct.unpack_from (archive.org link)

Verification, and code

Finally, we can unpack the pixels at that offset with `struct.unpack_from` (archive.org link)