Visually comparing PDF files with PHPUnit

Kennard • August 23, 2019

testing php

At work we have an HTTP endpoint which returns PDF responses. The endpoint accepts an id and some data. The id determines which template is chosen and the data is then applied to that template. What's very important is that the layout and dimensions of the elements on the page always stay the same for a given input. The amount of templates will keep on growing so we wanted some kind of automated regression test. To do that we store a baseline PDF file for each template. When the endpoint is called with the same set of data, a PDF response should be returned that is visually identical to that baseline.

My first thought was to just compare the binary data and be done with it. However this didn't work because a PDF file has variable metadata embedded inside such as the creation timestamp. We're only interested in the pixels so that's what we should extract and compare. For this we can use the Imagick php extension.

This is the static method I defined on my test class to compare two sets of PDF data. When the last parameter is provided, an image containing a visual diff is written to the directory the test class resides in.

private static function assertPDFEquals(string $expected, string $actual, ?string $debugFilename): void
{
    $actual = self::pdfBlobToImagick($actual);
    $expected = self::pdfBlobToImagick($expected);

    /**
     * @var Imagick $reconstructed
     * @var float   $delta
     */
    [$reconstructed, $delta] = $actual->compareImages($expected, 1);

    if ($debugFilename !== null && 0.0 !== $delta) {
        $filename = sprintf('%s/%s__%s.png', __DIR__, $debugFilename, Carbon::now()->format('Y-m-d_H:i:s'));
        $reconstructed->writeImages($filename, false);
    }

    self::assertEquals(
        0.0,
        $delta,
        'The actual PDF is visually not equal to the baseline. ' .
        'To output a visual diff in this directory, pass in the debugOutputFilename parameter'
    );
}

And this is the method which converts the PDF data to pixel data

/** @throws ImagickException */
private static function pdfBlobToImagick(string $blob): Imagick
{
    $imagick = new Imagick();
    $imagick->readImageBlob($blob);
    $imagick->resetIterator();

    return $imagick->appendImages(true);
}

The important part is the following bit. The second item in the returned array represents the numerical difference. When the images are equal, this value is equal to 0.0.

/**
 * @var Imagick $reconstructed
 * @var float   $delta
 */
[$reconstructed, $delta] = $actual->compareImages($expected, 1);