In article <email@example.com>, Greg Skinner
> By now, many if not most of you have probably read, or at least heard,
> about the lawsuit filed against several search engines, accusing them
> of conspiring to overcharge for advertisements. More information
> The WSJ recently featured a front-page story on the issue of click
> fraud as well.
> I have always been skeptical of the pay-per-click (PPC) method of
> charging for advertisements. I have always felt it was a poor
> business model, because of its susceptibility to fraud. I have never
> understood the search industry's fascination with PPC, especially
> since there are other methods of selling advertising, such as fixed
> fees, which provide no means (and thus, no incentive) to game the
> system by merely clicking on ads. Furthermore, the money that is
> spent both by the advertisers and publishers (including the search
> engines) implementing complex fraud detection systems can be put to
> more productive uses. Just about everyone I have spoken to with a
> technical background in Internet protocols and architecture seems to
> realize this, but the message doesn't get through to business people
> who feel that despite click fraud, PPC is a superior advertising model
> to any others. Perhaps there is something I have overlooked in my
> assessment of the risks vs. rewards of PPC advertising.
Yup. There is.
Q. "What is an ad worth?"
A. "Depends on how many people see it."
This is why a 30-second spot during the SuperBowl costs more one on
that same network at 3:00AM on a Tuesday.
Advertisers by space based on "cost per M" -- i.e. how many dollars
does it cost to geth their ad in front of a thousand potential 'customers'.
What they will pay "per thousand impressions" depends on how likely
somebody is to _buy_, after having encountered the ad. Things/places
with higher rates of sales "conversions" are worth more 'per thousand'
than those with lower conversion ratios. Given similar conversion
ratios, the place delivering the larger number of impressions is worth
more than the site with the smaller number of impressions.
Before an advertising buyer enters into _any_ contract for ad
placement one of their first questions is guaranteed to be "how many
people will see this ad?" It doesn't matter _how_ the space is priced
-- flat rate or $/M -- they want to know how many people it will
reach. Note: along this line, TV/radio ad contracts (especially for
"new" programming) usually specify a minimum vievership level -- if
the subsequent ratings show that that level wasn't reached, the
stations/networks are committed to run 'extra' ads to ensure the
required number of 'impressions'. There are parallels in print
advertising -- particularly when a new publication is starting up.
With web-pages, there is *no*way* to estimate how many people see any
particular ad. *OTHER* than to count how many times it was displayed.
And that is not a "reliable, accurate" number, by any means. What it
is, however, is the "best available" data for estimating.
"Smart" web-advertising buyers, have been, for _years_, specifying
that they will pay only for "unique" clicks -- only one hit from a
given IP address within a specified time-frame (minimum hours,
> It seems that PPC advertising is going to be a fixture in web
> advertising. Given that PPC makes click fraud easy, we can expect to
> see more of it in the future. This should be a serious concern to
> anyone who invests in search engines or other companies that do PPC
> advertising, or is a customer of such companies. At the very least,
> the companies need to disclose the criteria they use for determining
> that fraud has taken place, and the rights their customers
> (advertisers) have with regards to getting refunds for fraudulent
It's _always_ a "guessing game". You *cannot* know whether a given
click -- or pattern of clicks, even -- is legitimate or fraudulent.
The ROM in computers does *not* stand for "<R>ead <O>perator's <M>ind"
which is the necessary pre-requisite for making a 100% accurate
> I'd also like to know if there are any technical groups that are
> studying the issue and proposing solutions. From a standpoint of
> detecting fraud at its inception, I thought I might find some interest
> among the intrusion detection community, but I haven't yet. The types
> of intrusion detection done at the packet level don't seem to scale to
> the types of attacks I've witnessed, which suggests that the detection
> might be better done at the web server and/or web log processing
> level. I checked the Apache documentation to see if any work of that
> type had been done, and outside of some basic configuration options
> for blocking certain types of sites and requests, there wasn't any.
> Also, based on what I've read about some of the tools people are using
> to analyze web logs, they can detect certain types of fraud, but don't
> necessarily provide alerts of impending fraud, especially if the site
> receives a considerable amount of traffic. (This is especially the
> case for the largest search engines.)
Consider a "fleet" of 500,000 "zombie" PCs, scattered across three
Each machine, _once_a_day_, at a random time, connects to a given web-page,
without anybody in front of the machine.
Now, just _what_ are you going to detect?
Consider an ISP who gives all it's customers RFC-1918 addresses, and
does NAT/PAT to a relative handful of 'public' IP addresses. And
something shows up on slashdot (or similar) that gets "everybody" on
that network going to look at the particular page (and ads). Suddenly
you see a sh*tload of queries coming from the _same_ IP addresses --
including multiple simultaneous connects from a single address, just
with different source ports.
How do you differentiate _that_ from a single box with a click-bot running?
How do you differentiate _that_ from a single box with a click-bot *behind*
the NAT/PAT device?
An IP address is part of a DHCP pool. You get a number of clicks from
that address at widely separated times. Is that all one user, or is
each from a different user, who just happened to get that address fort
their dial-up session?
How do you *know* whether that address is a DHCP pool address or not?