lukestein’s avatarlukestein’s Twitter Archive—№ 11,246

    1. It has come to my attention that not everyone knows about @TabulaPDF, a great, free, open-source, cross-platform tool for extracting tabular data from (some) PDF files github.com/tabulapdf/tabula
  1. …in reply to @lukestein
    No, it does not do OCR. But Tabula handles multi-page tables with consistent formatting (including consistent formatting across multiple PDF files), and can separate columns with either lines or white space. Has a GUI, but also scriptable via CLI. Has saved me tons of time.