2017-01-31

Cache your properties with a decorator

This is a wonderful Python idea I've found quite usefull and fun to implement.

The reason of the need comes from an analytics project I'm working on.

I have to read data from several sources (let's think on a file, a database or a remote query) that changes at predefined periods (let's say each minute).

I need to continuously access the data for my calculations. But, due to performance issues, I don't want to read from disk or query the database too often... just at certain predefined time periods related with refresh rates of my data.

In this example I have a class for getting Euro foreign exchange reference rates:

In [1]:
import urllib2
import xmltodict

class ECB(object):
    def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
        self.__url = url
    @property
    def data(self):
        response = urllib2.urlopen(self.__url)
        return dict(
            map(
                lambda x: [x["@currency"], x["@rate"]], 
                xmltodict.parse(response.read()
                               )["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
        )

So I can instantiate an ECB object and get the desired data from a simple property

In [2]:
bank = ECB()
In [3]:
%%time
print bank.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 178 ms

This data changes daily so there's no point on continuously requesting from the remote server. But I can't just consider it as static data as my project will be running for months and continuously using this data.

I need to somehow tell my property to store a cached data to be used.

I'm working with different data sources, so I want something simple, legible and flexible enough. The most elegant way I can imagine to acomplish this is through a decorator.

Let's start creating a decorator that stores the data in a dictionary.

In [4]:
def cached(func):
    def func_wrapper(self):
        if '__cache__' not in dir(self):
            self.__cache__={}
        if func.__name__ not in self.__cache__.keys():
            self.__cache__[func.__name__] = func(self)
        return self.__cache__[func.__name__]
    return func_wrapper

Basically my new decorator looks for the name of my property in the keys of an dictionary stored as an attribute of my object called __cache__. If it doesn't exists it'll get the real data and store it there.

Now I can just decorate my property

In [5]:
class ECB(object):
    def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
        self.__url = url
    @property
    @cached
    def data(self):
        response = urllib2.urlopen(self.__url)
        return dict(
            map(
                lambda x: [x["@currency"], x["@rate"]], 
                xmltodict.parse(response.read()
                               )["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
        )

Instantiate the class

In [6]:
bank2=ECB()

and get the data for the first time:

In [7]:
%%time
print bank2.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 0 ns, sys: 4 ms, total: 4 ms
Wall time: 173 ms

Note it takes around $200 ms$ to get the data.

The next time I call for the property, the data is already stored in memory, so the time it takes to get the data is much less (around $200 \mu s$):

In [8]:
%%time
print bank2.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 191 µs

With the current implementation, the cached data is static. We want to make it refresh with a certain time period.

In our __cache__ dictionary we'll store not only the data but the time when it was updated, so each time we ask for it's value our property will consider if the data is still valid or outdated, and if so, refresh it.

In [9]:
import time

def cached_till(time_delta=1):
    def cached(func):
        def func_wrapper(self):
            now = time.time()
            if '__cache__' not in dir(self):
                self.__cache__={}
            if func.__name__ not in self.__cache__.keys():
                self.__cache__[func.__name__] = {
                    'time': now,
                    'data': func(self)
                }
            if now - self.__cache__[func.__name__]['time'] > time_delta:
                self.__cache__[func.__name__] = {
                    'time': now,
                    'data': func(self)
                }
            return self.__cache__[func.__name__]['data']
        return func_wrapper
    return cached
    

Now we can pass an argument to our new decorator with the time period we want the data to be updated (in seconds).

In [10]:
class ECB(object):
    def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
        self.__url = url
    @property
    @cached_till(time_delta=1)
    def data(self):
        response = urllib2.urlopen(self.__url)
        return dict(
            map(
                lambda x: [x["@currency"], x["@rate"]], 
                xmltodict.parse(response.read()
                               )["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
        )

So, instantiating it again and querying for a first time:

In [11]:
bank3=ECB()
In [12]:
%%time
print bank3.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 193 ms

The second time it'll use cached data and again take much less time:

In [13]:
%%time
print bank3.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 189 µs

But after enough time the query will update the data as we can see from the time it takes to get it:

In [14]:
time.sleep(1)
In [15]:
%%time
print bank3.data
{u'USD': u'1.0755', u'IDR': u'14363.56', u'BGN': u'1.9558', u'ILS': u'4.0575', u'GBP': u'0.86105', u'DKK': u'7.4373', u'CAD': u'1.4056', u'MXN': u'22.2855', u'HUF': u'310.64', u'RON': u'4.5030', u'MYR': u'4.7608', u'SEK': u'9.4505', u'SGD': u'1.5201', u'HKD': u'8.3436', u'AUD': u'1.4198', u'CHF': u'1.0668', u'KRW': u'1244.76', u'CNY': u'7.3970', u'TRY': u'4.0632', u'HRK': u'7.4790', u'NZD': u'1.4709', u'THB': u'37.793', u'NOK': u'8.8880', u'RUB': u'64.4302', u'INR': u'72.8005', u'JPY': u'121.94', u'CZK': u'27.021', u'BRL': u'3.3535', u'PLN': u'4.3239', u'PHP': u'53.489', u'ZAR': u'14.4440'}
CPU times: user 0 ns, sys: 4 ms, total: 4 ms
Wall time: 174 ms

As many other times, after a search on Google I realised I have reinvented the wheel and it already existed a library called `cached-property` doing exactly this (and surely better than I), but... Que me quiten lo bailao.

Share on: