Convert a tree of Word Documents to PDF using a splash of Python and LibreOffice

Background: I was given a whole bunch of nasty word documents which are inside multiple subdirectories. To make things worse there are multiple word docs in each subdirectory that are supposed to be treated as single documents. I have to generate a single PDF file for each subdirectory in as quick and dirty way possible as time is limited.

Step in Linux, Python and LibreOffice to the rescue!

Disclaimer… This is nasty code… I basically broke it down into a linear set of steps and typed away with reckless abandon. Full sideways pistol style coding… no exception catching whatsoever… use at you’re own risk!

Psuedo code is like this:

  1.   Get a list of subdirs based on path supplied as 2nd arg to this script
  2.   Loop through each subdir and run ‘libreoffice –invisible –convert-to pdf’ on all
  3.   Get a list of files in the same dir
  4.   Loop through and get each of the PDFs we just created and put into an array
  5.   Convert the array into a workable command line argument
  6.   Construct a command line using the linux tool pdftk which will stitch the files together
  7.   Fire up pdftk and generate an output pdf named using the foldernames for uniqueness
  8.   TODO: cleanup

Heres the code…