← Blog

PDF Joiner — What breaks, what can’t be preserved, how to prep files

What most often breaks during merging office fonts substituted — line breaks shift complex Excel pivot, slicers, Power Query PPT animations PDF doesn't support them EXIF orientation photo on its side without jhead PNG alpha lost via the JPEG path PDF bookmarks qpdf doesn't transfer signatures always invalidated AcroForm forms field name collisions encryption refused without password

The pipeline has hard limits. Some come from the formats: PDF cannot animate, signatures cannot survive a byte change. Others come from the server side: LibreOffice without the right fonts, no OCR, no MS Office. Knowing which is which lets you set the right expectations and prepare files so the combiner does its best work.

Things that often break

Office fonts

A docx that uses a font the server doesn’t have gets a substitution. The text reads, but glyph shapes change and line breaks shift.

Symptom: pages end in the wrong places, a “uniform” document looks uneven.

Fix: stick to fonts the server is likely to have (Arial, Times New Roman, Calibri-via-Carlito), or generate the PDF from Word yourself and upload that.

Complex Excel

Pivot tables, slicers, conditional formatting with formulas, named ranges, Power Query: all flatten or disappear under LibreOffice. The PDF gets a static snapshot with whatever filter was last applied.

Fix: simplify the Excel to plain values before uploading, or print to PDF from Excel directly.

PowerPoint animations

Animations, transitions, and speaker notes are gone. PDF cannot represent them.

Fix: nothing.

EXIF orientation

A converter that ignores EXIF Orientation lays portrait iPhone photos on their side.

Symptom: portraits rotated 90 degrees in the PDF.

Fix: rotate in Photos.app or Preview before uploading, which bakes the orientation into pixels and clears the flag. Or trust that the combiner runs jhead -autorot.

PNG alpha sent through JPEG

A pipeline that doesn’t distinguish PNG with alpha from PNG without ends up routing transparent logos through JPEG, which has no alpha. The transparent regions become white or black.

Symptom: a logo that was supposed to float on a page background sits on a white box.

Fix: use a combiner that branches on hasAlpha. If you don’t trust it, flatten the PNG against a chosen background color before uploading.

Bookmarks and table of contents

qpdf --pages doesn’t carry the outline tree. A merged PDF comes out without bookmarks even when every input had them.

Symptom: empty Bookmarks panel.

Fix: nothing pre-emptive. If outlines are critical, rebuild them after the merge in Acrobat Pro or with a pikepdf script.

Digital signatures

A signature on any input is invalid after merging. Always.

Symptom: “Signature is invalid” warning.

Fix: don’t merge legally signed PDFs. Sign the merged output instead.

AcroForm forms

Fillable fields can collide on field names across inputs. Static forms usually transfer; complex ones don’t.

Symptom: fields stop accepting input or display the wrong values.

Fix: fill the form before merging, which flattens it to text. Tools like pdftk ... flatten automate this.

Encryption

A password-protected PDF blocks qpdf. The combiner either asks for the password or refuses.

Fix: remove the password before uploading.

Size limits

Online combiners cap uploads (50, 100, 200 MB depending on the service). 50 HEIC photos at 3 MB each won’t fit.

Fix: reduce resolution before uploading, or run the pipeline locally.

Things that can’t be done at all

Exact MS Office fidelity without MS Office. LibreOffice produces a different PDF than Word, Excel, or PowerPoint. Most documents won’t show the difference; complex ones will. Generate the PDF in MS Office yourself.

Recovering structure that wasn’t there. A scanned PDF without an OCR layer stays unsearchable. Adding text recognition is a separate (usually paid) service.

Animation, video, sound. Mainstream PDF readers don’t play them. Conversion produces a static document.

Missing fonts. A PDF that referenced a custom font without embedding it cannot be reconstructed faithfully. The combiner falls back.

Preparation that pays off

Convert office files yourself before combining. Word’s built-in PDF export matches Word exactly. LibreOffice doesn’t. If layout matters, do the docx-to-PDF step in Word and upload the PDF.

Use standard fonts. Calibri, Arial, Times New Roman, Verdana, Tahoma are on most servers.

Photos as JPEG, graphics as PNG. HEIC, WebP, and AVIF work but get reduced to JPEG or PNG inside the pipeline regardless.

Rotate photos through your photo app, not via EXIF. The result is the same image with no orientation flag to misread.

Downsize photos before uploading if visual fidelity isn’t critical. 2000×1500 or 3000×2000 is plenty for most documents and shrinks the output PDF dramatically.

Remove blank separator pages from input PDFs. The combiner won’t guess they were intentional.

Sequence:

  1. Gather everything in one folder.
  2. Convert docx, xlsx, pptx to PDF in their native applications.
  3. Downsize photos if appropriate.
  4. Strip unwanted pages from any source PDFs.
  5. Upload only the prepared PDFs into the combiner.
  6. Optionally compress the result.

Alternatives

Adobe Acrobat Pro. The reference implementation. Manual control of the merge, accurate office insertion, bookmark and signature preservation through dedicated workflows. $15-25 per month; overkill for a one-off.

Desktop applications. PDFsam Basic and PDF24 are free; PDFsam Enhanced is paid. Full control, no upload limits, files stay on your machine.

Command line. On Linux or macOS, install qpdf, ImageMagick, and headless LibreOffice from the package manager. Maximum flexibility, zero cost.

Online services. Convenient for one-offs; uploading files to a third party is the trade-off.

Niche formats most combiners refuse

DjVu (old scanned books): convert to PDF locally with ddjvu -format=pdf input.djvu output.pdf from the djvulibre-bin package, then upload the PDF.

EPUB / MOBI / FB2 / AZW3 (ebooks): Calibre’s ebook-convert input.epub output.pdf handles them with options for page size and font.

XPS / OXPS (Microsoft’s PDF alternative, retired in Windows 10 1803): LibreOffice or gxps -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -sOutputFile=output.pdf input.xps.

CAD formats (DWG, DXF, STEP): specialized tools (LibreCAD, commercial alternatives). General-purpose combiners won’t touch them.

RAW photos (DNG, CR2, NEF, ARW): export to JPEG or TIFF in Lightroom or Capture One, then combine.

Combining is two stages, and most loss happens in stage one (per-format conversion), not in stage two (the merge). Prepare each file so it arrives at the combiner as a clean PDF, and the merge step does almost nothing visible.