Though looking pretty simple, generating a PDF document can lead to a number of issues. Let’s learn how to do it right with Ruby on Rails.
Generation of a PDF document with Ruby on Rails is something almost any SaaS faces up with. At one of our projects we faced up with an issue associated with a generation of PDF document from a template: at some point placeholders for data paste stopped parsing. So our Team decided to look into the gems source files to find out what the problem is. It took us some time to figure out the issue, so we decided to share the knowledge and save some nerve-racking hours for anybody who encounter the same type of bug. Spoiler alert: we recommend to ODF report gem if you need to generate a large number of documents from templates.
So in one of our projects we needed to generate circa 30 types of PDF documents for every entity. Each document could contain up to 20 pages and some them had dynamic data, depending on the entity type. Besides, the admin needed to have an option to edit templates, which kind of nipped the idea of generating each document manually, which is the case for the majority of PDF-generators, in the bud. We needed a gem that would take a document template, parse it and then put the prepared data in the required place. This was the way we came across ODF report gem. It takes .odt file uploaded by a customer with some dynamic data placeholders like [PASTE_HERE], parses placeholders and creates a new .odt document with all necessary data, which we converted to .pdf using gem Docsplit. The values for the placeholder are defined the following way during generation:
report = ODFReport::Report.new('~/projects/my_template.odt') do |r| r.add_field 'PASTE_HERE', @some_data #or r.add_field(:paste_here, @some_data) end
But similar to any comics story at some point something went wrong and the generated document had placeholder instead of the required data. It’s not the best type of business when you get a document with placeholder mentioned instead of your name, is it?
Then with no relevant reasons placeholders started parsing again (we were actively working with documents at that point: created new ones, updated old ones). We couldn’t understand what was going on since templates were provided by a customer and in case a placeholder didn’t parse, you could delete it and create a new one and then it worked just fine. Of course, nobody perceived this as a normal situation, so we decided to look into the issue. I started to dig into the gem deeper and deeper and noticed that in a case with faulty documents when Nokogiri parses it, it returns a few separate blocks:
At the same time the correct variant should work like this:
This way when we are trying to find the required placeholder in the parsed line, we can’t find it. Stay tuned, since it gets even more interesting from here. We couldn’t understand why sometimes we receive data as a single block and sometimes they are split. It was the right time for brute force method. I changed a document a few times, checked it manually, trying to understand what causes the bug. Finally, the answer was found.
The problem was that the text from ODT documents is parsed by one knot in case the text is entered consecutively and the same way. This means that if first you enter brackets [ ] and then press button left <- and enter the contents of the placeholder or in the case to save some time you copy-pasted one placeholder across the document and then changed the contents of the brackets, the placeholder will be split into 3 parts.
This way we managed to understand what the issue was and could carry on creating working templates without wasting time or hoping for a magic recovery. We hope our experience would be helpful to anybody, who faces up to the same issues.
Feel free to drop us a line, if you have any other issues with PDF documents, and will be glad to brainstorm the solution together!
5 Comments
Why not just use Prawn gem (http://prawnpdf.org/api-docs/2.0/) ? Sure, there is always a choice: either to write the whole template yourself and have more flexibility on what goes where, or chose à third party library, framework or whatever you nam it and be limited in what it offers. More of that, compared to the described above gems, prawn seems to be more up to date (just look at the last commit).
Hi Serguei! Thanks for your reply.
Yes we know about this gem, and as I wrote in the article, the specifics of this feature were that
1) customer wanted the ability to change the documents himself – so if the templates were created manually, he wouldn’t be able to change them without us.
2) we had to generate huge documents with a lot of tables (about 20 pages each document) and to be confident that everything will be rendered correctly (like without leaking table to another page) we would have to spend a lot of time on writing and validating template for every document manually
So according to our needs, it was obvious that we need some templates that can be generated by the customer.
Some how this post lacks an MWE. Here it is:
Notes:
* http://sandrods.github.io/odf-report
* http://documentcloud.github.io/docsplit
thanks for the information
I am very happy to read this article. At present, we are also encountering similar problems. Through the odf-report gem, there are always some tags that cannot be correctly identified and filled.
Have you solved the problem so far?