By getting metadata from some pdf files with bookmarks by pdftk, I observe the format of bookmarks in pdf metadata. So I am guessing that with pdftk, it is possible to add and edit bookmarks of a pdf file.
Here are three steps that I follow:
- I firstly get the metadata (including bookmarks if any) from a pdf
file into a text file bypdftk in.pdf dump_data > in.info. -
Next I add some bookmarks into the metadata text file
in.info, changing it
fromInfoKey: Creator InfoValue: PScript5.dll Version 5.2 InfoKey: Title InfoValue: SSReader Print. InfoKey: Producer InfoValue: Acrobat Distiller 7.0 (Windows) InfoKey: Author InfoValue: Administrator InfoKey: ModDate InfoValue: D:20050605073244+08'00' InfoKey: CreationDate InfoValue: D:20050605073244+08'00' PdfID0: 591a87c91dc76881fdf2ccf3811e72a5 PdfID1: 6b6ab11de8824e438e4f5eb1d85ec72 NumberOfPages: 400 PageLabelNewIndex: 1 PageLabelStart: 1 PageLabelNumStyle: DecimalArabicNumerals
to
InfoKey: Creator InfoValue: PScript5.dll Version 5.2 InfoKey: Title InfoValue: SSReader Print. InfoKey: Producer InfoValue: Acrobat Distiller 7.0 (Windows) InfoKey: Author InfoValue: Administrator InfoKey: ModDate InfoValue: D:20050605073244+08'00' InfoKey: CreationDate InfoValue: D:20050605073244+08'00' PdfID0: 591a87c91dc76881fdf2ccf3811e72a5 PdfID1: 6b6ab11de8824e438e4f5eb1d85ec72 NumberOfPages: 400 BookmarkBegin BookmarkTitle: Front cover BookmarkLevel: 1 BookmarkPageNumber: 1 BookmarkBegin BookmarkTitle: About the Author BookmarkLevel: 1 BookmarkPageNumber: 5 BookmarkBegin BookmarkTitle: Title page BookmarkLevel: 1 BookmarkPageNumber: 6 BookmarkBegin BookmarkTitle: Copyright page BookmarkLevel: 1 BookmarkPageNumber: 7 BookmarkBegin BookmarkTitle: Foreword BookmarkLevel: 1 PageLabelNewIndex: 1 PageLabelStart: 1 PageLabelNumStyle: DecimalArabicNumerals
- Then I try to put the edited metadata back to the pdf file by
pdftk.
in.pdf update_info in.info output out.pdf
But when I open the new pdf file out.pdf in evince or adobe reader, there is no bookmark in the new pdf file.
If I get the metadata from the new pdf file out.pdf by pdftk out.pdf dump_data > out.info, there will be no bookmarks in out.info. It looks like the bookmarks were not added successfully.
I was wondering if there are some mistakes? How shall I add and edit bookmarks of a pdf file, not necessarily by pdftk?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
It looks to me like from version 1.45 (2012-12-06), pdftk does allow modification of bookmarks via the update_info command:
You can now add or change a PDF’s bookmarks using update_info.
via:
http://www.pdflabs.com/docs/pdftk-version-history/
That means you can now update bookmarks by running the same command you tried originally:
pdftk in.pdf update_info in.bookmarks output out.pdf
Method 2
pdftk bookmark format is a little bit tedious to write. Instead I created my own script using bash, sed, pdftk and python3. Check it out at this repo: https://github.com/SiddharthPant/booky
So now I can create a text file(bkmrks.txt) like this which takes just 5 minutes to write even for a 1000 page pdf.
{
Title1, 1
Title2, 2
{
Subtitle1, 3
Subtitle2, 4
{
SubSubtitle1, 5
...
}
}
}
and then use my script
./booky.sh pdf_file.pdf bkmrks.txt
this automatically creates a pdf(pdf_file_new.pdf) that has my bookmarks in it.
Method 3
If you still stick with those unix scripts, then
- extract bookmark data dumped from
pdftk - write one extra script to convert dumped bookmark data to pdfmarks format, which ghostscript command
gsis accepted. - use
gsscript to merge them together with pdfmarks
Take a look at http://blog.tremily.us/posts/PDF_bookmarks_with_Ghostscript/ the script pdf-merge.py do exactly what you (or I) want.
pdf-merge.py --output=merged.pdf input1.pdf input2.pdf
Some minor improvements could be done in his script
- unicode handling
- output bookmark files, so people can adjust it as well
Anyway, it should work
Method 4
pdftk is definetly the right tool (with the right syntax):
BookmarkBegin BookmarkTitle: Chapter 1 BookmarkLevel: 1 BookmarkPageNumber: 1 BookmarkBegin BookmarkTitle: Paragraph 1.1 BookmarkLevel: 2 BookmarkPageNumber: 1
Method 5
jPDFtweak (Java, so runs on Unix/Linux) can alter bookmarks, but I don’t know if you can script anything with it.
For scripting, I’d guess your only native Unix/Linux option would be pdflatex with the pdfpages package. But that’s a learning curve if you’re not already a LaTeX user.
EDIT: Actually it may be possible with ghostscript: See here or here or
here
Method 6
Caleb,
As far as I understand, bookmarks as per the PDF spec can’t be injected via a diagnostic tool like pdftk. Updating the metadata to refer to chapters and bookmark landing anchors that don’t exist will definitely not work, it may even make your PDF inconsistent or unopenable.
I ended up using java+iText libraries to do what you’re suggesting as per this tutorial, but we’re dealing with hundreds of pdfs daily, so it needed to be automated. If you’re doing a one-off, Adobe Acrobat should be able to do this.
Method 7
To summarise all this good answers:
There is bmconverter project on GitHub that can convert various PDF bookmark formats. It can convert pdftk output to jpdftweak format, although if you will use jpdftweak then you wont need to convert pdftk output to csv, as you can do all work from within jpdftweak.
Unfortunately pdfmarks is not supported by the project, but fortunately someone posted script in bmconverter issues that can convert pdftk output to pdfmarks. So ghostscript batch option is an option
Method 8
Just add BookmarkBegin before each bookmark entry, as in
BookmarkBegin
BookmarkTitle: Front cover
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
⋮
Method 9
The redirect that you used in step 1 will cause update_info to not work correctly. You need to specify the output file to pdftk instead:
pdftk in.pdf dump_data output in.info
See this answer from similar question: https://stackoverflow.com/a/30308964/3158933
Files that I created using a redirect have a slightly larger file size and cause pdftk to issue a “Warning: unexpected case 1 in LoadDataFile(); continuing” message when running the update_info command.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0