Ruby on Rails: How to Keep Report Generation Logic Simple and Clear

Most systems and applications nowadays provide ways to generate reports. Ask any programmer, and they likely have experience working with reporting logic code, aggregating statements in Excel or saving data snapshots into a CSV file. However, it’s common that the code found—or even created ourselves—was either unfriendly to read, broke SOLID/DRY principles, or lacked any object patterns.

Controlling reporting logic from the very start goes a long way to make programming life easier. The most common headache stems from poor design assumptions. From the beginning of the project, we know if a system or app is required to generate one or more reports. 

At this stage, we should already be thinking about code planning. This allows us to handle many reports in the future, not just the early ones.

In this article, I want to present a universal solution to control reporting logic using Ruby, for projects at the beginning of their run, and also for existing applications. We’ll be creating a layer of abstraction to make sure our current and future code is legible, clear and easy to maintain.

A Real-world example

When it comes to report generation logic, programmers frequently throw together a “fast” solution—lumping logic into one class called asynchronously by the job/worker. Little do they know, they’ve created a technical debt from the very beginning!

A recent project I worked on provides a great case-study for the benefits of refactoring. 

The application was under development for 9 years, mainly as a hobby (and by interns). The reports were very important for the client, but during their creation, there were often errors (mainly due to lack of data), and modification of their content was associated with many hours of analyzing old and unreadable code. In order to help the client, we decided to refactor the reports and resolve any bugs, instead of adding more lines to the unreadable code.

Each report looked similar at first in its design, but differed in complexity and number of nests, as shown below:

class SomeCSVWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'package_import'

  def perform(id)
    # previously implemented class for storing reports
    export = Export.find(id)
    require 'csv'

    csv_file = ''

    # CSV generation algorithm
    csv_file << CSV.generate_line(%w[ID Some Attributes])

    # sample data
    SomeResource.where('created_at >= ?', Time.now - 1.month).each do |res|
      csv_file << CSV.generate_line([res.id, res.some. res.attributes])
    end

    # temporary file generation
    file_name = Rails.root.join('tmp', 'resource-name_timestamp.csv')

    File.open(file_name, 'wb') do |file|
      file.puts csv
    end

    # S3 File upload
    s3 = Aws::S3.new
    key = File.basename(file_name)
    file = s3.buckets['bucket-name'].objects["csvs/#{key}"].write(file: file_name)
    file.acl = :public_read

    # storing S3 file path to AR instance
    export.update(file_path: key)
  end
end

Originally, the logic of report generation was done in the Sidekiq worker code. This has several consequences:

  • The DRY rule is violated here several times. Repetitive code is responsible for adding a report to the S3 bucket, generating it or updating an instance in ActiveRecord.
  • The Single responsibility principle is also broken – a worker should only be responsible for performing a certain activity asynchronously, not to define the logic of that activity.
  • The open-closed principle is also violated – report logic is open for modification

The first problem that will occur is that changing a single algorithm step causes huge changes in all files. For example, implementing another way to generate a CSV file (such as switching to another library) requires the programmer to change the same lines in all report files. 

Even more importantly, let’s assume that we will somehow change the Export table (e.g. the file_path column). Again, the change must involve a large number of files, and thus have a higher chance of spoiling something along the way.

As for now, a few code fragments can be moved to higher layers of abstraction, and the report logic could also be moved to a separate class (i.e. service object would be beneficial here).

Planning the refactoring

First off, the general design of the service responsible for generating reports should be taken into consideration. To simplify, only the CSV reports will be analysed:

# frozen_string_literal: true

class CsvReportGeneratorService
  def initialize(export:)
    @export = export
  end

  def generate
    write_to_csv_file
    upload_report_to_s3
    update_export_path
  end

  private

  attr_accessor :export

  def write_to_csv_file; end

  def upload_report_to_s3; end

  def update_export_path; end
end

Next, a suitable test can be prepared:

describe CsvReportGeneratorService do
  # We use VCR in our projects to connect with S3
  describe '#generate', :vcr do
    let(:export) { create :export }
    let(:csv_generator) { CsvReportGeneratorService.new(export: export) }
    let(:client) { Aws::S3::Client.new }
    let(:report_file) { client.get_object export.file_path }
    let(:csv_file_data) { CSV.parse report_file }
    let(:csv_expected_data) { [] } # Here we may define expected data

    before { csv_generator.generate }

    it 'generates report' do
      expect(report_file).not_to be_nil
      expect(csv_file_data).to eq csv_expected_data
    end
  end
end

Furthermore, the Sidekiq’s Worker class can also be reduced into:

class CsvReportGeneratorWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'reports'

  def perform(id)
    export = Export.find(id)
    csv_generator = CsvReportGeneratorService.new(export: export)
    csv_generator.generate
  end
end

With the corresponding test:

describe CsvReportGeneratorWorker do
  describe 'worker queueing' do
    let(:report_generator_worker) { CsvReportGeneratorWorker.perform_async }

    it 'enqueues the job' do
      expect { report_generator_worker }.to change(CsvReportGeneratorWorker.jobs, :size).by 1
    end
  end
end

We may now create a block diagram to plan the design of the CsvGeneratorService class: 

Uploading a file to S3 and updating the Export record will look the same for each report. We may assume that the logic behind those two functions would be shared among the code. However, the generated data may naturally vary between the reports, and this will be the main topic of the article. Therefore, I have prepared three design patterns that might help clarify the code responsible for report data generation.

Template Method

The Template Method is very common in Ruby, letting us encapsulate parts of the code according to the DRY principle. This pattern provides a skeleton of the algorithm (parent class) and leaves the implementation of individual steps to the inherited classes.

In terms of report refactoring, this method will allow us to encapsulate the part responsible for generating data to the CSV file.

Factory Pattern

Fabrication is a creative pattern that lets us create objects of different classes using a defined interface without revealing their logic. In other words, by using a one-base class, we can create individual classes by implementing a given step/algorithm—in this case, it will be the code responsible for generating data.

Strategy Pattern

If you’re programming in Ruby on Rails, you may have been dealing with a Pundit gem to manage permissions in your application. The Strategy Pattern, or Policy Pattern, is used to create permissions and is based on composition. This allows us to encapsulate the algorithm in given classes, called strategies. Individual strategies can be invoked in context.

Different report types can constitute specific strategies and implement data generation into the report.

Patterns in action – implementing the solution

Now, it is the time to turn theory into practice. Using each pattern, a data generation implementation for a particular report class can be presented.

Template Method

To continue with the TDD convention, let’s start with writing a test case for our future report class, that should become green at the end of implementation:

describe SomeReport do
  describe '#generate_report' do
    let(:csv_data) { [] } # we can pass CSV data as an empty array
    # We expect below data in our report
    let(:expected_data) { [['headers'], ['report_body']] }
    let(:report_class) { SomeReport.new(csv_data) }

    before { report_class.generate_report }

    it 'generates correct data' do
      expect(csv_data).to eq expected_data
    end
  end
end

We expect SomeReport class to eventually return the report data in the expected_data function call that we can save to the report. Let’s define the template:

class TemplateClassCsv
  def initialize(csv)
    @csv = csv
  end

  def generate_report
    add_headers
    add_report_rows
  end

  private

  attr_accessor :csv

  def add_headers
    raise NotImplementedError
  end

  def add_report_rows
    raise NotImplementedError
  end
end

According to the template pattern, each class inheriting from TemplateClassCsv will implement its headers and report content (it may eventually also implement its generate_report method). For our sample report we get:

class SomeReport < TemplateClassCsv
  private

  def add_headers
    csv << ['headers']
  end

  def add_report_rows
    csv << ['report_body']
  end
end

At this point, our test is green, and we can add every subsequent report similarly. It is worth noting that to use a class in our service, we have two options: 

  1. Add the logic responsible for creating the report and upload it to the S3 bucket to TemplateClassCsv, and then call the report using the inheritance class, which will perform the steps of the algorithm correctly. However, this violates the principle of single responsibility
  2. Add a mapper which, based on the report_key, will call the appropriate inheritance class, which can also be achieved by fabrication in a more readable way.

Factory Pattern

Again, we start by writing a test. The fabrication pattern requires us to start implementation by creating a factory responsible for creating specific report classes. To keep the code legible, let’s recognize that the method responsible for creating a class based on the report key will eventually be called for:

describe CsvReportFactory do
  describe '.for' do
    context 'some_report' do
      let(:report_key) { :some_report }
      let(:expected_report_class) { SomeReport }

      it 'returns correct class' do
        expect(CsvReportFactory.for(:some_report)).to eq SomeReport
      end
    end
  end
end

The report itself and data generation should also be tested, however, we can use the previously written test—our factory returns the SomeReport class—which will also generate the report using the generate_report method. The implementation of the factory is very simple:

class CsvReportFactory
  # we can eventually move below code to the other class, like CsvReportErrors
  # and use something like CsvReportErrors < StandardError and
  # NoReportKeyProvided < CsvReportErrors
  class NoReportKeyProvided < StandardError; end

  def self.for(key)
    raise NoReportKeyProvided if key.blank?

    key.classify.constantize
  end
end

The SomeReport class can now be called in the service with:

class CsvReportGeneratorService
  def initialize(export:)
    @export = export
  end

  def generate
    write_to_csv_file
    upload_report_to_s3
    update_export_path
  end

  private

  attr_accessor :export

  def write_to_csv_file
    CSV.generate do |csv|
      report_generator_class = CsvReportFactory.for(export.key).new(csv)
      report_generator_class.generate_report
    end
  end

  def upload_report_to_s3; end

  def update_export_path; end
end

Strategy Pattern

By refactoring the code, I also faced the previously mentioned Strategy Pattern, which can also be used to handle reports. We would need a context that supports the selection of the appropriate strategy and a specific strategy to execute our report data generation. Nevertheless, we stick to the TDD approach and start with another test case:

describe CsvReportContext do
  describe '#determine_strategy' do
    context 'some_report' do
      let(:report_key) { :some_report }
      let(:csv_report_context) { CsvReportContext.new(report_key) }
      let(:strategy_class) { csv_report_context.determine_strategy }

     it 'returns correct strategy class' do
       expect(strategy_class).to eq SomeReportStrategy
     end
    end
  end
end

describe SomeReportStrategy
  describe '.generate_report' do
    let(:csv_data) { [] }
    let(:expected_data) { [['headers'], ['report_body']] }

    before { SomeReportStrategy.generate_report(csv_data) }

    it 'generates correct data' do
      expect(csv_data).to eq expected_data
    end
  end
end

We can expect that the context based on the report key will return an appropriate strategy implementing the generate_report method, which takes an array as an argument and enters data into it. Starting from the context:

class CsvReportContext
  def initialize(report_key)
    @report_key = report_key
  end

  def determine_strategy
    report_key_based_strategy
  end

  private

  attr_accessor :report_key

  def report_key_based_strategy
    raise NoReportKeyProvided if report_key.blank?

    case report_key
    when :some_report then SomeReportStrategy
    when :other_report then OtherReportStrategy
    else raise InvalidReportKey
    end
  end
end

We can move on to implementing the strategy:

class SomeReportStrategy
  def self.generate_report(csv)
    add_headers(csv)
    add_report_rows(csv)
  end

  def self.add_headers(csv)
    csv << ['headers']
  end

  def self.add_report_rows(csv)
    csv << ['report_body']
  end
end

Which would be used in the Service class as follows:

class CsvReportGeneratorService
  def initialize(export:)
    @export = export
  end

  def generate
    write_to_csv_file
    upload_report_to_s3
    update_export_path
  end

  private

  attr_accessor :export

  def write_to_csv_file
    CSV.generate do |csv|
      report_strategy = CsvReportContext.new(export.key).determine_strategy
      report_strategy.generate_report(csv)
    end
  end

  def upload_report_to_s3; end

  def update_export_path; end
end

The chosen solution

Eventually, we went with a variant with fabrication in our project. Our pick was motivated by the target architecture planned to be used in future (reports with various extensions in one service). At this point, we obtained a clear service to handle CSV reports:

class CsvReportGeneratorService
  def initialize(export:)
    @export = export
  end

  def generate
    write_to_csv_file
    upload_report_to_s3
    update_export_path
  end

  private

  attr_accessor :export, :path, :csv

  def write_to_csv_file
    CSV.generate do |csv|
      report_generator_class = CsvReportFactory.for(export.key).new(csv)
      report_generator_class.generate_report
      @csv = csv
    end
  end

  def upload_report_to_s3
    # uploader was also refactored, upload_file_to_s3
    # returns path to the file on S3 bucket
    s3_uploader = CsvReportGeneratorService::S3Uploader.new(csv)
    @path = s3_uploader.upload_file_to_s3
  end

  def update_export_path
    export.update(status: :finished, file_path: path)
  end
end

The code has become universal and legible, while the number of workers has been limited to one, created at the very beginning of our refactorization – the corresponding test lights up green and the refactorization is completed.

Summary

At the end of the refactoring process, we were able to achieve the following:

  • Universal application – the above logic can be used in different projects and at different stages of their life cycle
  • Readability and good practices – our service has about 50 lines of code, kept in one file
  • Future-proofing – using a step-by-step approach we know how to refactor the report code at different stages of the project, for different code sizes.
  • A simple queuing solution – whether we use Resque or Sidekiq, our main worker and inherited workers (for each report) can look like this:
class ReportGeneratorWorker
  include Sidekiq::Worker

  def perform(id)
    report = Export.find(id)
    csv_generator = CsvReportGeneratorService.new(export: export)
    csv_generator.generate
  end
end
  • Preserved open-closed principle – each class can be modified freely (in terms of functionality of a given library for a given report extension), without affecting the implementation and functioning of other classes
  • Ability to implement SQL views (often forgotten) – which allow to speed up report generation (in our case, holding indexed views helped to reduce generation time by about 20% for large reports)
  • Low entry threshold and, consequently, lower costs for the client – a programmer starting to work on reports (after familiarizing himself with their implementation) when adding a new report only has to add his class responsible for generating data, after preparing an appropriate test.

You might be interested in:

Let’s Talk About Your Project!

Have an exciting project in mind? Or maybe would like to improve your current setup?
We’d be happy to discuss it with you. Let’s get in touch!

accept



Our Privacy Policy has been updated in line with the new General Data Protection Regulation(GDPR)