Skip Site Navigation «

Rant

vnav«brainpan«rant

Agent vs. Agent

“A user agent acts on behalf of a user. Software agents include servers, proxies, spiders, browsers, and multimedia players.”

W3C: Architecture of the World Wide Web, Volume One. From the Introduction (December, 2004).

User Agents

The phrase “User agent” or user-agent or UA or browser or client or client application or client software program...all pretty much refer to the same thing.

Or maybe not.

Many people quickly assume this term is the jargon equivalent of “browser” (there is a much better definition of user-agent on Wikipedia). And while a browser may not represent you in the same way as a human agent, it does perform an action on your behalf.

Or maybe not.

In other words, sometimes a user-agent can be malicious.

Agent in White

Most of the time you will initiate a request for a Web page, and in these cases a browser represents you in a very direct way—in order to fetch the resource for you and return it so you can view it. Or perhaps so you can listen to it.

But what about a crawler/spider/robot? These are also user-agents, though you personally didn’t ask them to do anything (at least not when they made the crawl). But they do make it possible to find things on the Web. Consider that the next time you search for (and find) something on Google. So maybe not so much on your behalf, as on our behalf.

In an even broader sense, a Web server is also a user-agent. Let’s look at a typical client-server model event cycle: the browser (on your behalf) requests a page from a Web server, which (if it has what you asked for) then fetches the document (also on your behalf, but also on behalf of any advertisers or similar content that may be present on the page). It then returns the document to your browser (excuse me, user-agent). Or any other user-agent that requests the same resource. Assuming there are no restrictions on who, or what, can request it.

Agent in Black

So how can a user-agent be malicious? In any of the same ways that people can be, naturally, since user-agents don’t write themselves. This isn’t a movie folks. A spider can be malicious because it may not follow the rules or is only looking for one thing and doesn’t care anything about adding to your Web experience. An email harvester is an example of this. All these little beasts do is search the Web looking for addresses to add to their owner’s databases, so they can spam their victims until they probably want to scream every time they open their email program (also a user-agent). So, a harvester is also an agent, only the “user” in this equation is the spammer.

A crawler can also ignore acceptable behavior when you, as a Web site owner, edit your robots.txt file to say: “okay, you’re allowed to look around here and there, but not here” and the agent ignores this and pokes around anywhere it damn well pleases, often trying to look directly at the things it’s not supposed to look at. And this can lead to some pretty interesting ideas from folks trying to combat the guys in the black hats. Try this simple experiment sometime: Edit your robots.txt file and add a rule that disallows access to an arbitrary directory, it‘s not important whether it even exists. Now wait a few days and scan your log files looking for any agents that tried accessing that directory. Hmm...

User-agent: * Disallow: /agent_black/

You may also want to check out Project Honey Pot, which is a grassroots organization trying to at least slow the flood of this stuff—a problem that I personally think accounts for a measurable drain on the entire network. And for what ROI? I would love to look at the numbers: Let’s say, for every 100,000 people they piss-off, one, maybe one clicks through to one of these dumb ads? And for those few that do, what percentage actually buy something? I suspect that spammers have to make a LOT of people very angry to get a scant handful of sales. Sigh.

The list of bad guys goes on. What about these downloadable browser toolbars? They can certainly enhance your online experience. But just as many can be deceiving and are really interested in sniffing around on your personal computer looking for CC numbers or popping up ads in your face when you probably don’t like ads popped up in your face. In this example, it’s the Web server that’s the bad guy. Or rather the people that configured it to deliver these adware, spyware or other programs to your door. One in particular is not among this list. The Netcraft Toolbar has a number of useful features, and is also used by a community of alert members to help prevent fraud and phishing attacks.

And what about commercial browsers? Internet Explorer from Microsoft is a very popular user-agent, but does it wear a white hat? While it never set out to be a bad guy, I would certainly lump it in the category of not-so-nice user-agents. Why? Because the developers thumbed their noses at accepted standards, and even worse, left the thing wide open for exploitation by the guys wearing the really black hats.

Wear a White Hat

The Web is an amazing resource. It was built with openness and a free exchange of ideas and software, and by a lot of very hard working people that you may never have heard of. Without Bill Joy, or Tim Berners-Lee, or Larry Wall or any number of other people who didn’t get rich, or ever wanted to get rich, we wouldn’t have the Web. Sadly, it’s also awash with rats and thieves. Often, some of their techniques are so sophisticated I have to wonder why they don’t expend some of that energy on legitimate enterprises. Even worse, many are mere children who exchange little scripts and don’t even understand what they’re doing and think it’s cool trying bring down or deface someone’s Web site.

Not cool at all.

In closing, I leave you with your very own three line Perl user-agent:

#!/usr/bin/perl use LWP::Simple; getprint shift;

Or, if you prefer Python:

#!/usr/bin/python import sys, urllib print urllib.urlopen(sys.argv[1]).read()

And please, buy yourself a white hat.

A humorous post on the history of this article is available on my blog. You are welcome to submit questions and comments there.

This article has been graciously republished by Evolt on Oct 11th, 2005.

—doug

Posted: Sunday, August 28th, 2005 @ 1:25 PM EST [2005-08-28T05:25:17Z]  

Tables are Evil

I’m just a little sick of these elitist Web designers who have decided that any use of a <table> in your markup is somehow evil. You can even get one of those cool little 80x15 pixel buttons like mine proclaiming to all that your site is 100% Table Free! Tables were never meant for page design! they SHOUT. Granted, nesting them 17 levels deep to layout an overly graphic page, slicing and dicing the images and recombining them with text intermingled—is a RBI. I know, I’ve seen it a million times. I know because as a Web programmer I’ve had to deal with sloppy markup for years.

Yet these same <div> purists [divatics, divaddicts?] will happily mangle a <ul> into fancy mouseover menus. Bite me. I say use the tools you have to get the job done. Look around, I do. Mark my words, when XHTML 2.0 is in wide-spread use it will be, tsk, tsk, tsk from the podheads, bad show using a <ul> for navigation lists!

Assuming (cough) the browser vendors even implement the <nl> element correctly. I smell a whole new wave of hacks and assorted wizardry in our future.

Tabular Data

Tables should only be used for tabular data! is another anemic argument I hear all the time. How far back do we have to go before you couldn't merge rows and/or columns in a spreadsheet? Lotus 123 for DOS v1? Or place text labels in cells? Or insert charts/graphs and other mixed content? If tables should only be used for boring lists of raw data then throw out all your <dl> tags that omit definitions. Take a look at my about page, I use a <fieldset> to wrap around a list of very famous definitions and not a <form> in sight. The fact of the matter is this—tables are fantastic tools for laying out complex forms. I’m going to hell for sure.

If tabular data is your cup of tea, then I can’t think of a more appropriate expression than a table that presents another table. It might be another blue moon before you see a table that’s coded like that. Hint: view the source Luke.

A <div> isa <div> isa <div>...

Or is it? Then there are people who use the display: table CSS rules. Yes, I know, they are primarily designed for use with XML which has no predefined elements or semantics for displaying, well, anything. Which is where CSS of course comes into play and why we have tools like XSL. But when I see someone using these techniques and then puffing about how they don't use tables when in fact they are mirroring the structure of a table exactly it really frosts my ass.

FAQ: WTF is XSL?

Even more Digression

FLW used to call Walter Gropius and his cronies from the International Style the glass box boys. I always got a kick out of that because Frank was a genius and he knew it. The glass box boys were no-talent ass clowns and they knew it too.

Now no offense meant, there are plenty of very talented web designers out there. They can certainly blow me out of the water. But get off the table-free bandwagon, will ya?

Committees

Sigh. Then there are the standards committees. Don’t even get me started on that. Hell, I can’t resist. What better organization than the W3C? Who uses tables to layout example <ruby> code because no browser supports it? Whose XHTML validator UA doesn’t even bother to inform your Web server it accepts application/xhtml-xml when requesting the page?

I guess we’re supposed to accept recommendations from an organization that doesn’t even savvy the underlying HTTP protocol. Perhaps that’s the difference between a standard and a protocol.

Okay, I’m done. And I even seem to feel a little better. As always, your comments and rebuttals are welcome.

—doug

Posted: Friday, December 3rd, 2004 @ 8:41 AM EST [2004-12-03T08:41:28Z]  

Comments

Where can I order an ass clown in time for Christmas?

—steve

Posted: Friday, December 3rd, 2004 @ 1:02 PM EST [2004-12-03T13:02:11Z]  

Swing a dead cat and you’ll hit one.

—doug

Posted: Saturday, December 11th, 2004 @ 9:59 AM EST [2004-12-11T09:59:54Z]  

Ping Me

Hello? McFly? This is a ping

[sm/RIS]doug:/laz/www/vnav/brainpan> ping -c6 apple.com PING apple.com (17.254.3.183): 56 data bytes 64 bytes from 17.254.3.183: icmp_seq=0 ttl=45 time=39.363 ms 64 bytes from 17.254.3.183: icmp_seq=1 ttl=45 time=39.460 ms 64 bytes from 17.254.3.183: icmp_seq=2 ttl=45 time=39.280 ms 64 bytes from 17.254.3.183: icmp_seq=3 ttl=45 time=39.598 ms 64 bytes from 17.254.3.183: icmp_seq=4 ttl=45 time=39.293 ms 64 bytes from 17.254.3.183: icmp_seq=5 ttl=45 time=39.608 ms --- apple.com ping statistics --- 6 packets transmitted, 6 packets received, 0% packet loss round-trip min/avg/max/stddev = 39.280/39.434/39.608/0.133 ms

Dorks.

Posted: Sunday, December 19th, 2004 @ 7:59 PM EST [2004-12-19T19:59:56Z]   Last updated: Sunday, November 30th, 2008 @ 1:00 AM EST [2008-11-30T06:00:57Z]   home

(c) 2008-2010, Douglas W. Clifton, loadaveragezero.com, all rights reserved.