Cache your properties with a decorator
This is a wonderful Python idea I've found quite usefull and fun to implement.
The reason of the need comes from an analytics project I'm working on.
I have to read data from several sources (let's think on a file, a database or a remote query) that changes at predefined periods (let's say each minute).
I need to continuously access the data for my calculations. But, due to performance issues, I don't want to read from disk or query the database too often... just at certain predefined time periods related with refresh rates of my data.
In this example I have a class for getting Euro foreign exchange reference rates:
import urllib2
import xmltodict
class ECB(object):
def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
self.__url = url
@property
def data(self):
response = urllib2.urlopen(self.__url)
return dict(
map(
lambda x: [x["@currency"], x["@rate"]],
xmltodict.parse(response.read()
)["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
)
So I can instantiate an ECB object and get the desired data from a simple property
bank = ECB()
%%time
print bank.data
This data changes daily so there's no point on continuously requesting from the remote server. But I can't just consider it as static data as my project will be running for months and continuously using this data.
I need to somehow tell my property to store a cached data to be used.
I'm working with different data sources, so I want something simple, legible and flexible enough. The most elegant way I can imagine to acomplish this is through a decorator.
Let's start creating a decorator that stores the data in a dictionary.
def cached(func):
def func_wrapper(self):
if '__cache__' not in dir(self):
self.__cache__={}
if func.__name__ not in self.__cache__.keys():
self.__cache__[func.__name__] = func(self)
return self.__cache__[func.__name__]
return func_wrapper
Basically my new decorator looks for the name of my property in the keys of an dictionary stored as an attribute of my object called __cache__
. If it doesn't exists it'll get the real data and store it there.
Now I can just decorate my property
class ECB(object):
def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
self.__url = url
@property
@cached
def data(self):
response = urllib2.urlopen(self.__url)
return dict(
map(
lambda x: [x["@currency"], x["@rate"]],
xmltodict.parse(response.read()
)["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
)
Instantiate the class
bank2=ECB()
and get the data for the first time:
%%time
print bank2.data
Note it takes around $200 ms$ to get the data.
The next time I call for the property, the data is already stored in memory, so the time it takes to get the data is much less (around $200 \mu s$):
%%time
print bank2.data
With the current implementation, the cached data is static. We want to make it refresh with a certain time period.
In our __cache__
dictionary we'll store not only the data but the time when it was updated, so each time we ask for it's value our property will consider if the data is still valid or outdated, and if so, refresh it.
import time
def cached_till(time_delta=1):
def cached(func):
def func_wrapper(self):
now = time.time()
if '__cache__' not in dir(self):
self.__cache__={}
if func.__name__ not in self.__cache__.keys():
self.__cache__[func.__name__] = {
'time': now,
'data': func(self)
}
if now - self.__cache__[func.__name__]['time'] > time_delta:
self.__cache__[func.__name__] = {
'time': now,
'data': func(self)
}
return self.__cache__[func.__name__]['data']
return func_wrapper
return cached
Now we can pass an argument to our new decorator with the time period we want the data to be updated (in seconds).
class ECB(object):
def __init__(self, url='http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml'):
self.__url = url
@property
@cached_till(time_delta=1)
def data(self):
response = urllib2.urlopen(self.__url)
return dict(
map(
lambda x: [x["@currency"], x["@rate"]],
xmltodict.parse(response.read()
)["gesmes:Envelope"]["Cube"]["Cube"]["Cube"])
)
So, instantiating it again and querying for a first time:
bank3=ECB()
%%time
print bank3.data
The second time it'll use cached data and again take much less time:
%%time
print bank3.data
But after enough time the query will update the data as we can see from the time it takes to get it:
time.sleep(1)
%%time
print bank3.data
As many other times, after a search on Google I realised I have reinvented the wheel and it already existed a library called `cached-property` doing exactly this (and surely better than I), but... Que me quiten lo bailao.
Share on: