You are here: Dive Into Python > HTTP Web Services > Handling Last-Modified and ETag | << >> | ||||
Dive Into PythonPython from novice to pro |
Now that you know how to add custom HTTP headers to your web service requests, let's look at adding support for Last-Modified and ETag headers.
These examples show the output with debugging turned off. If you still have it turned on from the previous section, you can turn it off by setting httplib.HTTPConnection.debuglevel = 0. Or you can just leave debugging on, if that helps you.
>>> import urllib2 >>> request = urllib2.Request('http://diveintomark.org/xml/atom.xml') >>> opener = urllib2.build_opener() >>> firstdatastream = opener.open(request) >>> firstdatastream.headers.dict{'date': 'Thu, 15 Apr 2004 20:42:41 GMT', 'server': 'Apache/2.0.49 (Debian GNU/Linux)', 'content-type': 'application/atom+xml', 'last-modified': 'Thu, 15 Apr 2004 19:45:21 GMT', 'etag': '"e842a-3e53-55d97640"', 'content-length': '15955', 'accept-ranges': 'bytes', 'connection': 'close'} >>> request.add_header('If-Modified-Since', ... firstdatastream.headers.get('Last-Modified'))
>>> seconddatastream = opener.open(request)
Traceback (most recent call last): File "<stdin>", line 1, in ? File "c:\python23\lib\urllib2.py", line 326, in open '_open', req) File "c:\python23\lib\urllib2.py", line 306, in _call_chain result = func(*args) File "c:\python23\lib\urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "c:\python23\lib\urllib2.py", line 895, in do_open return self.parent.error('http', req, fp, code, msg, hdrs) File "c:\python23\lib\urllib2.py", line 352, in error return self._call_chain(*args) File "c:\python23\lib\urllib2.py", line 306, in _call_chain result = func(*args) File "c:\python23\lib\urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 304: Not Modified
![]() |
Remember all those HTTP headers you saw printed out when you turned on debugging? This is how you can get access to them programmatically: firstdatastream.headers is an object that acts like a dictionary and allows you to get any of the individual headers returned from the HTTP server. |
![]() |
On the second request, you add the If-Modified-Since header with the last-modified date from the first request. If the data hasn't changed, the server should return a 304 status code. |
![]() |
Sure enough, the data hasn't changed. You can see from the traceback that urllib2 throws a special exception, HTTPError, in response to the 304 status code. This is a little unusual, and not entirely helpful. After all, it's not an error; you specifically asked the server not to send you any data if it hadn't changed, and the data didn't change, so the server told you it wasn't sending you any data. That's not an error; that's exactly what you were hoping for. |
urllib2 also raises an HTTPError exception for conditions that you would think of as errors, such as 404 (page not found). In fact, it will raise HTTPError for any status code other than 200 (OK), 301 (permanent redirect), or 302 (temporary redirect). It would be more helpful for your purposes to capture the status code and simply return it, without throwing an exception. To do that, you'll need to define a custom URL handler.
This custom URL handler is part of openanything.py.
class DefaultErrorHandler(urllib2.HTTPDefaultErrorHandler):def http_error_default(self, req, fp, code, msg, headers):
result = urllib2.HTTPError( req.get_full_url(), code, msg, headers, fp) result.status = code
return result
![]() |
urllib2 is designed around URL handlers. Each handler is just a class that can define any number of methods. When something happens -- like an HTTP error, or even a 304 code -- urllib2 introspects into the list of defined handlers for a method that can handle it. You used a similar introspection in Chapter 9, XML Processing to define handlers for different node types, but urllib2 is more flexible, and introspects over as many handlers as are defined for the current request. |
![]() |
urllib2 searches through the defined handlers and calls the http_error_default method when it encounters a 304 status code from the server. By defining a custom error handler, you can prevent urllib2 from raising an exception. Instead, you create the HTTPError object, but return it instead of raising it. |
![]() |
This is the key part: before returning, you save the status code returned by the HTTP server. This will allow you easy access to it from the calling program. |
>>> request.headers{'If-modified-since': 'Thu, 15 Apr 2004 19:45:21 GMT'} >>> import openanything >>> opener = urllib2.build_opener( ... openanything.DefaultErrorHandler())
>>> seconddatastream = opener.open(request) >>> seconddatastream.status
304 >>> seconddatastream.read()
''
Handling ETag works much the same way, but instead of checking for Last-Modified and sending If-Modified-Since, you check for ETag and send If-None-Match. Let's start with a fresh IDE session.
>>> import urllib2, openanything >>> request = urllib2.Request('http://diveintomark.org/xml/atom.xml') >>> opener = urllib2.build_opener( ... openanything.DefaultErrorHandler()) >>> firstdatastream = opener.open(request) >>> firstdatastream.headers.get('ETag')'"e842a-3e53-55d97640"' >>> firstdata = firstdatastream.read() >>> print firstdata
<?xml version="1.0" encoding="iso-8859-1"?> <feed version="0.3" xmlns="http://purl.org/atom/ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en"> <title mode="escaped">dive into mark</title> <link rel="alternate" type="text/html" href="http://diveintomark.org/"/> <-- rest of feed omitted for brevity --> >>> request.add_header('If-None-Match', ... firstdatastream.headers.get('ETag'))
>>> seconddatastream = opener.open(request) >>> seconddatastream.status
304 >>> seconddatastream.read()
''
![]() |
|
In these examples, the HTTP server has supported both Last-Modified and ETag headers, but not all servers do. As a web services client, you should be prepared to support both, but you must code defensively in case a server only supports one or the other, or neither. |
<< Setting the User-Agent |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
Handling redirects >> |