Though looking pretty simple, generating a PDF document can lead to a number of issues. Let’s learn how to do it right with Ruby on Rails.
Generation of a PDF document with Ruby on Rails is something almost any SaaS faces up with. At one of our projects we faced up with an issue associated with a generation of PDF document from a template: at some point placeholders for data paste stopped parsing. So our Team decided to look into the gems source files to find out what the problem is. It took us some time to figure out the issue, so we decided to share the knowledge and save some nerve-racking hours for anybody who encounter the same type of bug. Spoiler alert: we recommend to ODF report gem if you need to generate a large number of documents from templates.
So in one of our projects we needed to generate circa 30 types of PDF documents for every entity. Each document could contain up to 20 pages and some them had dynamic data, depending on the entity type. Besides, the admin needed to have an option to edit templates, which kind of nipped the idea of generating each document manually, which is the case for the majority of PDF-generators, in the bud. We needed a gem that would take a document template, parse it and then put the prepared data in the required place. This was the way we came across ODF report gem. It takes .odt file uploaded by a customer with some dynamic data placeholders like [PASTE_HERE], parses placeholders and creates a new .odt document with all necessary data, which we converted to .pdf using gem Docsplit. The values for the placeholder are defined the following way during generation:
report = ODFReport::Report.new('~/projects/my_template.odt') do |r|
r.add_field 'PASTE_HERE', @some_data
But similar to any comics story at some point something went wrong and the generated document had placeholder instead of the required data. It’s not the best type of business when you get a document with placeholder mentioned instead of your name, is it?
Then with no relevant reasons placeholders started parsing again (we were actively working with documents at that point: created new ones, updated old ones). We couldn’t understand what was going on since templates were provided by a customer and in case a placeholder didn’t parse, you could delete it and create a new one and then it worked just fine. Of course, nobody perceived this as a normal situation, so we decided to look into the issue. I started to dig into the gem deeper and deeper and noticed that in a case with faulty documents when Nokogiri parses it, it returns a few separate blocks:
At the same time the correct variant should work like this:
This way when we are trying to find the required placeholder in the parsed line, we can’t find it. Stay tuned, since it gets even more interesting from here. We couldn’t understand why sometimes we receive data as a single block and sometimes they are split. It was the right time for brute force method. I changed a document a few times, checked it manually, trying to understand what causes the bug. Finally, the answer was found.
The problem was that the text from ODT documents is parsed by one knot in case the text is entered consecutively and the same way. This means that if first you enter brackets [ ] and then press button left <- and enter the contents of the placeholder or in the case to save some time you copy-pasted one placeholder across the document and then changed the contents of the brackets, the placeholder will be split into 3 parts.
This way we managed to understand what the issue was and could carry on creating working templates without wasting time or hoping for a magic recovery. We hope our experience would be helpful to anybody, who faces up to the same issues.
Feel free to drop us a line, if you have any other issues with PDF documents, and will be glad to brainstorm the solution together!
We are sorry that this post was not useful for you!
Let us improve this post!
Tell us how we can improve this post?