Automatic Document Redaction with PDFpen

I've been thinking a bit about my scanning workflow. It was prompted by this excellent episode of the Technical Difficulties podcast. On that show, Gabe discussed how he's careful about now not scanning anything to a cloud service that has confidential information such as account numbers, social security numbers, etc. There was later discussion on as to whether the process of redacting this information from scans could be automated.

It can - thanks to my good friends at Smile. For several versions now, PDFpen has had the ability to search a file for a string of text and redact that text. However, this functionality was not accessible via AppleScript. I begged with them to make it so. They came through in PDFpen version 6.

Here's A script created by Greg Scown, the co-founder of Smile and modified (slightly) by me. I use this in combination with Hazel to search a PDF document for string of texts. Before I give you the script, a couple of warnings:

  1. The searching only works if the document has been OCR'ed. Otherwise there's no information for it to read. If documents aren't OCR'ed by your scanner, you may want to OCR the documents with PDFpen before performing the search

  2. Be aware that confidential information comes in different variations. For example, soemtimes my social security number will be on a document as "123456789" and other times it will be "123-45-6789" or sometimes it will be "123 45 6789" so look for this and redact accordingly.

  3. Unfortunately, PDFpen doesn't support a string of searches to allow you to redact multiple items with a single script. To get around this, I'll modify and repeat this AppleScript within the same Hazel Rule as necessary to make sure I cover all variations of the data to be redacted.

  4. Once redacted, the information is gone. So if you would ever need the document with the confidential information in tact, you should save a copy elsewhere before performing redaction.

  5. Most importantly - automated redaction isn't perfect. So make sure you review your documents before you send off any sensitive information.

Here's the AppleScript I use in Hazel:

tell application "PDFpenPro 6"
open theFile as alias
tell document 1
search string "ENTER TEXT HERE"
repeat while performing search
delay 0.1
end repeat
redact with block
repeat while performing redaction
delay 0.1
end repeat
delay 1
close with saving
end tell
tell application "PDFpenPro 6"
quit
end tell
end tell