(This time in English, to make this article referencable to a broader audience)
Some of you may be aware of the technique of banner grabbing to find out what software is running on a remote machine. It is done by reading out transmitted data that contains a product name and/or version, such as HTTP's "Server:" response header. Like virtually every content sent by a server, the server is totally free what to send, i.e. this can be easily faked by an HTTP server's admin, and is sometimes even done.
Of course: faking Server headers won't protect you from the instant pwnage success of automated malware, in case your HTTP server software is vulnerable. Nevertheless, there are
some people out there who think that masking the Server header would prevent any attacks (and they even try to make money with that assumption).
Banner grabbing is definitely the easiest way of trying to
fingerprint an HTTP server, and the one that is easiest to defeat. In this article I will describe some simple techniques that I came up with that make HTTP fingerprinting much more reliable, and don't depend upon the
value of any header (BTW, I
am aware of
httprint and their pretty good paper on the same subject, but I got aware of it after I already had a simple and working prototype of my ideas).
Fingerprinting in real life (as seen in CSI et al) usually concentrates on identifying unique
features of a fingerprint, and matching it with other fingerprinting samples. So, what can be used as the fingerprint of an HTTP server? Whatever it returns to an HTTP client. Of course, not everything, the content of a requested resource is totally irrelevant for our case. So, what else is there? The response code (you know,
HTTP/1.1 200 OK) and the headers.
When we have a look at the response code, it is pretty obvious that some HTTP server won't be easily distinguishable from another HTTP server through the answer of a regular GET request of an existing file (except for software such as
gatling that returns "200 Coming Up" instead of the usual "200 OK"). But what we can do is to try out irregular or unusual HTTP requests. I tried out three different features:
- Unusual request method (DELETE / HTTP/1.1)
- Illegal protocol name (HEAD / ABCD/1.1)
- Illegal protocol version (HEAD / HTTP/3.0)
. I tried out how different servers react on such requests, and the result was really astonishing: it was really hard to find two different HTTP servers that behaved in the same way for all 3 requests! (now I have a file with server signatures, and of the 27 fingerprints that I collected so far, there exist 23 unique response code triplets)
Such a quick achievement was really beyond my expectations, but soon I found out that there does exist some software that behaves almost the same. One such situation was
thttpd and
publicfile, and
Apache-Coyote (the HTTP/1.1 server stack of Apache Tomcat)and
lighttpd 1.5.0. So I was in need of a second feature. I had already used the response code, so what was left were the headers. But I wasn't really interested in the headers themselves, but more in the order of the headers. I quickly checked a few HTTP servers, and it soon showed that this would be promising.
So I added the order of the header names to the fingerprint of an HTTP server. As qualified headers I used virtually all headers, except for "Location", "Set-Cookie" and all "X-" headers: all of them are related to local configuration and/or applications, and have nothing to do with the HTTP server per se. Of course, not every server will return exactly the same set of headers in exactly the same order, I knew that there would be deviations. So I came up with a simple algorithm to determine the
similarity of two fingerprints:
Let's say I have an existing fingerprint of a known HTTP server, and a recently retrieved fingerprint of a yet-to-be-identified HTTP server. My algorithm takes the first of the expected headers, and searches for it in the actual header at the first headers. If the header can't be found, I'll search for the next expected header, starting from the same position. If the header can be found, I go the next actual header, and start searching from there for the next expected header. If it can't be found, I continue with the next expected header, otherwise I move forward to the next actual header, and continue searching from there, and so on, until I'm finished with either the expected or actual headers. For every expected header that was found in the actual headers, I increment a counter. When I'm done, the similarity between expected and actual headers is
counter / number of expected headers, i.e. a real number in the interval [ 0.0, 1.0 ].
The last step was to combine this with the response-code-based fingerprinting technique developed previously: a similarity factor between expected and actual response codes is computed by counting the number of response codes that match. As before, the similarity is
counter / number of compared response codes (currently, the number of compared response codes is 3). This has the advantage that deviant response codes (e.g. through local configuration) still produce a reasonable similarity factor (.e.g if an HTTP server is configured to return a 302 redirect when a
DELETE / is encountered, but the other 2 response codes are the same, the similarity factor is still at
2/3).
The final step to get an ultimate similarity factor is to compute the arithmetic mean of all similarity factors (i.e.
(a + b) / 2). This procedure is applied to all available fingerprints. The fingerprints are then sorted in descending order by their similarity factor, and all fingerprints above a certain threshold (0.8 brought good results in my experiments) are selected as fingerprints that match very likely. If no such fingerprint could be found, the threshold is lowered by 0.1 until a fingerprint is found.
If you're interested in a proof-of-concept implementation of the technique that I described here, just have a look at the my HTTP fingerprinting tool with the working title "
dactylizer" (a combination of dactyloscopy and analyzer), which even includes a pretty usable signature file with fingerprints for 27 different HTTP servers. Here's a sample session:
span style="color: #666666; font-style: italic;"># according to Server header: Apache
# according to Server header: lighttpd/1.4.13
The find-match.rb matches an HTTP server's fingerprint with the list of existing fingerprints, while the pbaf.rb ("powder, brush, adhesive foil") script prints out a string that can be cut and pasted into the signature file. As you can see in this session, the fingerprinting doesn't work perfectly (at least when checking fun.drno.de), but it has the correct guess as #2 of 3 entries that it presents to the user.
A few interesting notes about stuff that I found while experimenting a bit with these tools during development: IBM HTTP Server sends an "epKe-Alive" header. WTF?! As of 2008-01-10, 0:33 am, googling for "epke-alive" yields exactly 0 search results. This looks so
NUXIesque, but isn't. What also amused me was when I tried to fingerprint the
opentracker bittorrent tracker (I fingerprinted denis.stalker.h3q.com:6969), it returned "Fnord 1.10" with a similarity factor of 0.8. Fnord is an HTTP server written by Fefe, and opentracker gets its scalability from the libowfat library, also written by Fefe. As if my fingerprinting tool knew about the relationships of those two pieces of software.