Bots

A friend pointed me at a Scientific American article today, titled “Baffling the Bots“. Its a fun read, I suppose. It sort-of credits Yahoo as having pioneered this stuff in 2000. But we totally did this at Remarq in 1998/1999.

The reason we had to do it was because we had tons of images on our site, and people were sending bots to go find them all. This ate up a fair amount of bandwidth. So, we just required users to type in the number in the picture before they could view pictures, and they had to redo this every 50 pictures or so. Interestingly, we just displayed a simple 3 digit number. The implementation was cheap – we pre-generated the numbers and actually only ever displayed about 50 different numbers. As far as we knew, though, it worked 🙂

RSS Book

I just finished reading the O’Reilly RSS book by Ben Hammersley. Its a decent book. Published in March of this year, so its pretty up to date. I have to admit its got me excited about RSS modules.

As I mentioned before, who wants to read this?

I do have an interesting application. I want to sell stuff. I want people to read my forsale items via my RSS feed. Why? Because I want Google and all the other search engines to help me disseminate my goods. I don’t want to pay Ebay 2% and then pay another 2.5% to Paypal (oh wait, thats ebay too, isn’t it?). Why can’t we all just share this info via RSS? Indexers and robots of the world can parse it out and become our delivery vehicle.

Are Weblogs the same as usenet?

Usenet never fully grabbed me. I was what they called a ‘lurker’. I would read some things, but never got too involved in any groups. That was the status of most of us out there. As I read more blogs and get to know them better, I realize that its the same application, with a new face. The question really is, whats the difference between Usenet and Blogs? The technology is a little different, but the concept is very similar.

Right now, I’m writing to my own moderated newsgroup called alt.belshe.mike. I’ve set it up so that only I can create new threads, and anyone can follow up my threads. And, of course, every Joe in the universe may want to create his own alt.focker.joe newsgroup so he can control his own corner of the web too.

From the User’s perspective
From the user’s perspective, usenet and blogs are pretty similar. Blogs is a little more ‘open’ than usenet was, in that anyone can create a new group or start cross-linking to another group. Usenet was more closed in that there was a process for creating new groups. Weblogs are also a bit more “free” in that users can post whatever they want to. Usenet newbies in the 90’s were often scolded for putting that awful-HTML stuff into usenet posts. There are no references to “Netiquette” when you start to write your own blog.

But these are a little different too. Usenet had the problem of many-to-many authorship. This meant that each group could have many authors. Sure, you can do this in blogs, but it isn’t really common practice. Blogs take the approach of each user being more of a broadcaster or publisher. I create the topics, you read them.

Technology
Usenet was a distributed system. Each server decided which newsgroups to pick up and disseminate. Each server could be an origin server for a post, so each server had the capability to create a unique ID for a post (e.g. the Message-id header). Each server then pushed its content to its peer servers at some interval as setup by the system administrator. Thus, each posting made it to many servers around the globe – distribution.

A Blog is not a distributed system by itself. Its just a posting of my content, whatever I want, in whatever format I want it to be in. I can link to other blogs, and other blogs can link to mine, but its still not distributed. There is only one copy of my content anywhere. However, the interesting part is that aggregation and search technologies are starting to emerge which create the ‘distributed’ part of blogs. Imagine a world where everyone has their own blog crawler roaming around the net finding interesting blogs. In essence, you’ve created a polling-based distributed system.

Its interesting. If I were setting out to design a distributed system, a polling based system is not one I would create. But is that where this stuff is headed? The web already works this way, more or less. You can post HTML content on the web, and then how is it found? Well, you get linked into a well known place like dmoz and wait, or you manually submit your site to a bunch of robots. Then these robots come and crawl your site. In my own little world, I find that robots hit my site more than people do. All so that if someone might *want* to find my site, they will find it when they search for it.

But search engines still don’t really create a distributed system. Sure, google may cache my content and allow searchers to read my pages without ever visiting my site. But, thats not really distributed, thats a single replica. Should these crawlers running about be collecting copies of our blogs and regurgitating them in new formats? On one hand, it makes the overall system more robust. It creates copies for everyone to have. But on the other hand, did I just lose control over my content?

So, in this regard, the two systems are different.

If this is so much like usenet, where is the porn?
This is the real question of course. But the answer lies in the fact that blogs are more of a publishing system than a conversation/messaging system. With usenet, the servers were hard to administer and maintain, so only schools and companies afforded to run their own servers. As such, we schemers and scammers out there discovered we had a virtually “free” pipe to share our much needed porn. With weblogs, if I create a porn-of-the-day blog, its my own bandwidth that gets usurped by the hoardes of one-handers out there that are looking for that stuff. And there is one other reason too. Blogs aren’t yet very discoverable. It was all to easy to discover alt.binaries.images.XXX on a news server. The mechanisms for finding a good blog about porn are still very limited.

Blog strengths and weaknesses
Some strengths:

  • Freedom of formatting/creative choice
  • Ease of topic-creation
  • Simple for neophytes
  • Integrates well with web technologies
  • I can control my content more tightly
  • For individual blog reading, no need for a complex server.

Some weaknesses:

  • No central repository for lookups or searches
  • Users broadcast rather than converse on topics
  • Client applications are weak – I just want to know what topics are new since yesterday. There is no inherent way to do this today.

I ate at Stoddard’s today.

First Entry

This is my foray into the weblog world. I downloaded Movable Type a few weeks ago, and just now finally got it Installed and running. Ran into a few perl problems that had me frustrated. But, it seems to be working. Then edited the style sheet, hacked around a few other things, and viola – it seems to work.

I used to work at Remarq, which was a company providing web-access to usenet. (It was later acquired by Critical Path). I must say, the whole blogging concept looks very very similar to usenet. Sure, the technology is a little different, but the concepts are the same – people can publish topics, ideologies, etc, others can read it, and then you can get whole threads of conversations going. So, I find this interesting… Its also interesting to read about people stumbling into many of the same issues that Usenet stumbled into.

Oh yeah, and I had chicken for dinner.