Running “unique” tasks with celery

I use celery to update RSS feeds in my news aggregation site. I use one @task for each feed, and things seem to work nicely.

There’s a detail that I’m not sure to handle well though: all feeds are updated once every minute with a @periodic_task, but what if a feed is still updating from the last periodic task when a new one is started ? (for example if the feed is really slow, or offline and the task is held in a retry loop)

Currently I store tasks results and check their status like this:

import socket
from datetime import timedelta
from celery.decorators import task, periodic_task
from aggregator.models import Feed


_results = {}


@periodic_task(run_every=timedelta(minutes=1))
def fetch_articles():
    for feed in Feed.objects.all():
        if feed.pk in _results:
            if not _results.ready():
                # The task is not finished yet
                continue
        _results = update_feed.delay(feed)


@task()
def update_feed(feed):
    try:
        feed.fetch_articles()
    except socket.error, exc:
        update_feed.retry(args=, exc=exc)

Maybe there is a more sophisticated/robust way of achieving the same result using some celery mechanism that I missed ?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Based on MattH’s answer, you could use a decorator like this:

def single_instance_task(timeout):
    def task_exc(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            lock_id = "celery-single-instance-" + func.__name__
            acquire_lock = lambda: cache.add(lock_id, "true", timeout)
            release_lock = lambda: cache.delete(lock_id)
            if acquire_lock():
                try:
                    func(*args, **kwargs)
                finally:
                    release_lock()
        return wrapper
    return task_exc

then, use it like so…

@periodic_task(run_every=timedelta(minutes=1))
@single_instance_task(60*10)
def fetch_articles()
    yada yada...

Method 2

From the official documentation: Ensuring a task is only executed one at a time.

Method 3

Using https://pypi.python.org/pypi/celery_once seems to do the job really nice, including reporting errors and testing against some parameters for uniqueness.

You can do things like:

from celery_once import QueueOnce
from myapp.celery import app
from time import sleep

@app.task(base=QueueOnce, once=dict(keys=('customer_id',)))
def start_billing(customer_id, year, month):
    sleep(30)
    return "Done!"

which just needs the following settings in your project:

ONCE_REDIS_URL = 'redis://localhost:6379/0'
ONCE_DEFAULT_TIMEOUT = 60 * 60  # remove lock after 1 hour in case it was stale

Method 4

If you’re looking for an example that doesn’t use Django, then try this example (caveat: uses Redis instead, which I was already using).

The decorator code is as follows (full credit to the author of the article, go read it)

import redis

REDIS_CLIENT = redis.Redis()

def only_one(function=None, key="", timeout=None):
    """Enforce only one celery task at a time."""

    def _dec(run_func):
        """Decorator."""

        def _caller(*args, **kwargs):
            """Caller."""
            ret_value = None
            have_lock = False
            lock = REDIS_CLIENT.lock(key, timeout=timeout)
            try:
                have_lock = lock.acquire(blocking=False)
                if have_lock:
                    ret_value = run_func(*args, **kwargs)
            finally:
                if have_lock:
                    lock.release()

            return ret_value

        return _caller

    return _dec(function) if function is not None else _dec

Method 5

I was wondering why nobody mentioned using celery.app.control.inspect().active() to get the list of the currently running tasks. Is it not real time? Because otherwise it would be very easy to implement, for instance:

def unique_task(callback,  *decorator_args, **decorator_kwargs):
    """
    Decorator to ensure only one instance of the task is running at once.
    """
    @wraps(callback)
    def _wrapper(celery_task, *args, **kwargs):
        active_queues = task.app.control.inspect().active()
        if active_queues:
            for queue in active_queues:
                for running_task in active_queues[queue]:
                    # Discard the currently running task from the list.
                    if task.name == running_task['name'] and task.request.id != running_task['id']:
                        return f'Task "{callback.__name__}()" cancelled! already running...'

        return callback(celery_task, *args, **kwargs)

    return _wrapper

And then just applying the decorator to the corresponding tasks:

@celery.task(bind=True)
@unique_task
def my_task(self):
    # task executed once at a time.
    pass

Method 6

This solution for celery working at single host with concurency greater 1. Other kinds (without dependencies like redis) of locks difference file-based don’t work with concurrency greater 1.

class Lock(object):
    def __init__(self, filename):
        self.f = open(filename, 'w')

    def __enter__(self):
        try:
            flock(self.f.fileno(), LOCK_EX | LOCK_NB)
            return True
        except IOError:
            pass
        return False

    def __exit__(self, *args):
        self.f.close()


class SinglePeriodicTask(PeriodicTask):
    abstract = True
    run_every = timedelta(seconds=1)

    def __call__(self, *args, **kwargs):
        lock_filename = join('/tmp',
                             md5(self.name).hexdigest())
        with Lock(lock_filename) as is_locked:
            if is_locked:
                super(SinglePeriodicTask, self).__call__(*args, **kwargs)
            else:
                print 'already working'


class SearchTask(SinglePeriodicTask):
    restart_delay = timedelta(seconds=60)

    def run(self, *args, **kwargs):
        print self.name, 'start', datetime.now()
        sleep(5)
        print self.name, 'end', datetime.now()


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x