This articles documents two related issues encountered by Sidekiq workers during my recent code deployments to the production environment. Both issues highlight the importance of understanding Sidekiq’s behavior when modifying worker classes or their arguments.
The first issue involved the removal of a scheduled Sidekiq worker class, while the second issue pertained to modifying the arguments accepted by a worker class. In both cases, errors occurred due to Sidekiq’s handling of existing jobs in the queue.
The report aims to analyze the root causes of these issues, outline the remediation steps taken, and propose preventive measures to avoid similar occurrences in the future. Additionally, it summarizes the key lessons learned to improve the team’s development practices and ensure a more robust Sidekiq implementation.
As mentioned in the introduction, this report covers two different cases but with the same behavior.
The engineer split the worker into several classes to make the data smaller and chunked into separate groups of subscription types. So the new classes are ResetFreeDocQuotaWorker and ResetPaidDocQuotaWorker. Therefore the old class ResetDocQuotaWorker was removed from the schedule.yml
After the changes were deployed to production, the new worker appeared in the sidekiq-cron list scheduler. However, the old class ResetDocQuotaWorker still exists and running as usual, causing an error in our monitoring report.
Both issues in the previous problem statement section have the same behavior. Which was the sidekiq job client still processing the old behavior before the changes. The root of the problem is that Sidekiq persists jobs in Redis, and it doesn’t automatically update or remove these persisted jobs when you make changes to your worker classes or schedule file. Here are the detailed explanations of both problems.
This section tries to explain how we should handle the changes in the sidekiq worker.
If you need to remove the existing sidekiq cron schedule class follow these steps:
So instead of removing the job from the schedule.yml we can disable the job by setting the job status to disabled. This setup prevents the scheduled job from running after the changes are deployed to the server.
{
# MANDATORY
'name' => 'name_of_job', # must be uniq!
'cron' => '1 * * * *', # execute at 1 minute of every hour, ex: 12:01, 13:01, 14:01
...
'class' => 'MyClass',
# OPTIONAL
...
'status' => 'disabled'
}
job = Sidekiq::Cron::Job.find('name_of_the_job')
# example: Sidekiq::Cron::Job.find('reset_doc_quota_worker')
# Remove the job
job.destroy if job
This one is a bit tricky, but the rule of thumb is that the sidekiq worker changes should be backward compatible. It means that the latest changes to the existing worker should not affect the previous version of the worker and vice versa.
module BulkSigns
class SignGlobalWorker
include Sidekiq::Worker
def perform(user_id, envelope_id, initial, signature)
# do something
end
end
end
As you can see in the previous code snippet, the SignGlobalWorker class accepts 4 arguments. So if you need to update or add more arguments follow these rules.
❌
def perform(user_id, signature, envelope_id, initial)
# do something
end
❌
def perform(user_id, envelope_id, initial)
# do something
end
✅
def perform(user_id, envelope_id, initial, signature, signature_id = nil)
# do something
end
If the refactoring is inevitable, you can consider creating a new worker class instead of modifying the existing one. This option will prevent the ArgumentError from being raised because the old worker will process existing parameters from Redis and eventually stop running. After all, no new job is dispatched to the old job.
module Fruit
module V2
class BananaWorker
include Sidekiq::Worker
def perform(params)
# do something
end
end
end
end
Fruit::V2::BananaWorker.perform_async({
user_id: USER_ID,
envelope_id: ENVELOPE_ID,
initial: INITIAL,
signature: SIGNATURE,
signature_id: SIGNATURE_ID
})
The issues documented in this report highlight the importance of understanding Sidekiq’s behavior and taking necessary precautions when modifying worker classes or their arguments. Failure to do so can lead to errors, job failures, and potential data inconsistencies or loss.
The root cause analysis revealed that Sidekiq persists jobs in Redis and does not automatically update or remove these persisted jobs when changes are made to worker classes or the schedule file. This behavior can cause issues when removing scheduled worker classes or modifying the arguments of existing worker classes.
Adhering to these principles will maintain a robust and reliable Sidekiq implementation for efficient background job processing.