. Geller - September 14, 2000
This is Part II of a two part article.
Part I: The Why's and What's of Auditing
Part II: The Audit Process
What's in an audit?
The audit is a process for verification of the numbers that you report to your advertisers. Audits can be performed in a number of different ways.
* Server-based audits examine data that is available at the server, most importantly traffic logs and web logs. An auditing organization will prowl through the logs to check for various kinds of impressions that should not be reported. This investigation will include an examination of the parameters you use to run your traffic analysis programs. Auditors may insert software in the web server that causes independent logs, totally under the auditor's control, to be created.
* Panel-based audits measure the surfing behavior of a sample panel of users, and attempt to project that statistically to the entire Web population
* Browser-based audits attempt to confirm actual ad displays. For example, an applet can be attached to an ad or to a page; the applet will report when the ad is actually displayed on some user's browser.
Larger consumer sites like Yahoo and Amazon.com, and their advertisers, use panel-based audits, and the numbers are sometimes front-page news. Smaller consumer sites and B2B sites generally don't have the volume for panel-based audits to be statistically significant, and rely mostly on server-based audits. Browser-based audits are a newer technique and are not heavily used.
How good are the different techniques? Jim Spaeth, President of the Advertising Research Foundation, tells of comparisons where on site X a server-based procedure showed 15% of the traffic shown by a panel-based audit, but on a second site Y the order of the methods was reversed, with the server-based procedure showing 300% of the traffic that the panel-based audit did. "This kind of result gives people chills down the back of the neck," Mr. Spaeth said. He also noted that different procedures of the same class also tend to produce different numbers.
Figure 1, from a comparison of three different measures of traffic on Yahoo in 1999, shows graphically how different techniques may differ; the Figure was originally published on TheStandard.com.
Who sets the standards?
When your CFO faces an audit, it's always perfectly clear what's required. If there are problems, the auditor can explain exactly what they are. Try this: Ask your CFO if it would be surprising to have an audit performed by two different highly respected firms on the same business at the same time and get wildly different results. You already know what the answer is: some variant of "that shouldn't happen." That's because accounting firms and standards bodies have agreed on rules for audits that cover almost any question that could be asked. Yet, despite the apparent simplicity of the data that need to be analyzed and the fact that the Web is all technology all the time, the standards just aren't there yet.
This may be partly because there is no single recognized standards body as there is for financial accounting (within a country). However, that situation is starting to change as voluntary or ad hoc organizations put in the work to develop standards. One such organization is FAST, which stands for Future of Advertising Stakeholders. FAST has developed a number of draft standards for how and what to measure, and some are being adopted voluntarily. However there is no legal or even quasi-legal pressure to make sites, software manufacturers or auditing firms adhere to them.
One promising attempt to level the playing field is the planned September launch of Audit Central, a web site that will publish audit reports that have been made publicly available by the sites that were audited. The site is run by ABC Interactive, BPA International, and Engage I/Pro. These competing audit firms have a clear interest in improving the quality of audits and public recognition of their value. The site is scheduled to begin with approximately 600 reports, all from companies that have agreed to make their reports public.
How Many Visitors? A Sample Nightmare.
One of the simplest measures that any site wants to know is how many different individuals - "unique visitors" in industry parlance - visit the site. FAST's draft standard on Metrics and Methodology suggests "three acceptable methods for identifying unique users: unique registration, unique cookies and unique IP address with heuristic." Of these, it suggests that unique registration is the best, "Sites that register visits should have no problem determining the page requests that belong to the same visitor. A site must use 100% registration in order to use this method validly."
Next is the use of unique cookies. If a unique cookie is dropped on every browser, the user can be uniquely identified even without any personal information. The third method calls for the use of IP addresses. However, IP addresses are only an approximate match to actual users. As FAST states, "It must be noted that IP addresses can and often do represent more than one user, so this measure does not necessarily represent the number of people reached. It should also be noted that dynamically assigned IP addresses impact the accuracy of this methodology."
Few websites require registration before showing any pages at all to a user, so the most practical way to track individuals uniquely is with cookies. (True some small percentage of users block cookies, but because there are so few they become largely irrelevant to the discussion). The standard doesn't say that a site has to drop cookies, only that if it doesn't it must have another way to count visitors.
If the site doesn't drop unique cookies then visitor calculations have to be done by making educated guesses based on IP address. Such guesses could take into account the time period between two pages served to the same IP address, the click trail as revealed in the referrer field, and other items. In the latter category are cookies that may be dropped by the web server without the website taking explicit action; Microsoft's IIS in particular can end up dropping quite a few. Any particular traffic program can use any of these means to count visitors, but there is no one best way to do so.
To make things worse, consider caching. When a surfer clicks on a link there is really no guarantee that your website will even see the request in its logs. The page may be cached someplace between your server and the user's browser - in the user's machine. Websites certainly want to claim views of cached pages or ads as part of their traffic, but by definition these can only be estimated. So, again, how can reliable numbers be generated?
How to prepare for an audit: Know Thy Traffic
It is absolutely important to understand two things about traffic numbers.
1. You won't get it 100% correct.
2. Politics
The first point is probably obvious. What with caching, proxy servers, the ability of users to block cookies, and other factors, there's no way to be perfect. Nor is that a problem. Given the variations between different audit styles, consistency and traceability are your best bets. If the numbers you report are only 5% or even 15% off from your first audit you'll probably set a huge round of applause from the auditors.
Second, we want to make sure that you understand that traffic numbers are as much a political matter as a technical one. The classic example is the problem that was faced by sales people of companies who, early in the evolution of Web advertising, wanted to be above-board in their use of traffic numbers. "We're reporting page impressions," one of those sales people told TEC years ago, "but we're competing with people who still report hits. And the customer doesn't understand the difference." While no advertiser is going to get caught buying hits instead of impressions today, until the sites of your competitors are audited there's no way to know how accurate the numbers they provide to advertisers are. So sales and marketing folk may have different levels of interest from the IT staff in pruning the numbers down to the absolute minimum.
Most sites use commercial traffic reporting products or services. While these are certainly appropriate for use on a regular basis, we recommend that sites expecting to be audited at some point come to an understanding of their traffic before relying too heavily and too long on such packages. The accuracy of the commercial offerings is limited by how well you can configure them to exclude page impressions that should not count for an audit. You can expect a package to automatically exclude images like .gif and .jpg files, and some come out-of-the-box ready to exclude the larger search engines, but they can't know the characteristics of your site.
In fact, from a traffic point of view you may not know the characteristics of your site until you create a small project to examine the logs in detail. It should take only a week or two of programmer time to write a program that can counts impressions, visitors and visits. The careful inspection of the logs and the derivation of algorithms you'll need to do this will put you on a firm footing both to configure your commercial log analysis software and to be prepared for a traffic audit.
Among the areas to pay attention to are:
* IP Addresses: Do you know which IP addresses people from the company (or from partners) will be recorded as coming from? Can you create an estimate of the number of people who actually come from such overloaded addresses as AOL's proxy servers to use your site?
* Cookies: TEC recommends the use of a unique cookie to identify visitors. However, you may discover that your server software has its own supply of cookies. Microsoft's SiteServer, through its various features, has the capability of dropping many cookies. These, unless you understand them carefully, may have the effect of confounding your traffic reporting software, probably leading to a significant over-counting of visitors.
* Usernames: If your site has registration the usernames can appear in the traffic logs and be quite helpful in validating your numbers. But if you don't require people to register immediately the same person may appear in the logs both with and without a username. This would have the effect of inflating your visitor count and decreasing the measure of the average time spent on the site per user.
* Caching: Having your static pages cached by remote servers or browsers helps reduce the load on your own servers and on the network as a whole, but at the cost of reducing your traffic counts. You can develop estimates of the degree to which this occurs by inserting directives into the HTML code that will have the effect of invalidating versions stored in caches, or by changing the modification dates to make those pages look new. The former approach gives a better estimate since it in theory causes every browser to reload the pages every time, while the latter approach merely causes reloads once by caching servers. Trials that mix both methods can lead to the best estimates of real traffic.
* Bots: There are lists of "known" robots and search engines published on the Internet, and some traffic packages routinely use these lists to remove unwanted impressions. However, these lists are not complete, and unknown robots regularly search your site and can cause significant spikes and consistent over-estimates of your traffic. The only protection here is eternal vigilance. While most robots identify themselves in the User-agent field of the log, many do not. One way to find such impolite robots is to look for users who visit a large number of pages in a short period of time. If you know which specialized search engines visit your site, you find out directly from them what IP addresses they use and set your software to ignore them.
It may be that you end up with more special cases than your commercial software can deal with. This will mean that its reports over- or under-estimate what you believe the accurate numbers to be. In that case a few data points should establish the nature of this difference. You can then adjust the numbers from the package before reporting them - making sure to revalidate the relationship periodically.
Reporting on advertising is a different matter. It is conceptually similar, in that you can specify some kinds of impression as ineligible for counting. Typically the ad server logs contain less information about individuals than do your traffic logs, so many of the opportunities for removing bogus impressions are not easily available. Since this is the same boat everyone else is in there should be no problem when it comes to an audit. And your ad serving vendor or service should be able to show you that their overall procedures have been certified by some auditing agency, and should be able to advise you about any particular circumstances on your site. However, it will be up to you, through analysis of your traffic, to discover special situations that might be necessary to account for in ad reports.
source
http://www.technologyevaluation.com/research/articles/traffic-audits-make-strange-bedfellows-part-ii-the-audit-process-16104/
This is Part II of a two part article.
Part I: The Why's and What's of Auditing
Part II: The Audit Process
What's in an audit?
The audit is a process for verification of the numbers that you report to your advertisers. Audits can be performed in a number of different ways.
* Server-based audits examine data that is available at the server, most importantly traffic logs and web logs. An auditing organization will prowl through the logs to check for various kinds of impressions that should not be reported. This investigation will include an examination of the parameters you use to run your traffic analysis programs. Auditors may insert software in the web server that causes independent logs, totally under the auditor's control, to be created.
* Panel-based audits measure the surfing behavior of a sample panel of users, and attempt to project that statistically to the entire Web population
* Browser-based audits attempt to confirm actual ad displays. For example, an applet can be attached to an ad or to a page; the applet will report when the ad is actually displayed on some user's browser.
Larger consumer sites like Yahoo and Amazon.com, and their advertisers, use panel-based audits, and the numbers are sometimes front-page news. Smaller consumer sites and B2B sites generally don't have the volume for panel-based audits to be statistically significant, and rely mostly on server-based audits. Browser-based audits are a newer technique and are not heavily used.
How good are the different techniques? Jim Spaeth, President of the Advertising Research Foundation, tells of comparisons where on site X a server-based procedure showed 15% of the traffic shown by a panel-based audit, but on a second site Y the order of the methods was reversed, with the server-based procedure showing 300% of the traffic that the panel-based audit did. "This kind of result gives people chills down the back of the neck," Mr. Spaeth said. He also noted that different procedures of the same class also tend to produce different numbers.
Figure 1, from a comparison of three different measures of traffic on Yahoo in 1999, shows graphically how different techniques may differ; the Figure was originally published on TheStandard.com.
Who sets the standards?
When your CFO faces an audit, it's always perfectly clear what's required. If there are problems, the auditor can explain exactly what they are. Try this: Ask your CFO if it would be surprising to have an audit performed by two different highly respected firms on the same business at the same time and get wildly different results. You already know what the answer is: some variant of "that shouldn't happen." That's because accounting firms and standards bodies have agreed on rules for audits that cover almost any question that could be asked. Yet, despite the apparent simplicity of the data that need to be analyzed and the fact that the Web is all technology all the time, the standards just aren't there yet.
This may be partly because there is no single recognized standards body as there is for financial accounting (within a country). However, that situation is starting to change as voluntary or ad hoc organizations put in the work to develop standards. One such organization is FAST, which stands for Future of Advertising Stakeholders. FAST has developed a number of draft standards for how and what to measure, and some are being adopted voluntarily. However there is no legal or even quasi-legal pressure to make sites, software manufacturers or auditing firms adhere to them.
One promising attempt to level the playing field is the planned September launch of Audit Central, a web site that will publish audit reports that have been made publicly available by the sites that were audited. The site is run by ABC Interactive, BPA International, and Engage I/Pro. These competing audit firms have a clear interest in improving the quality of audits and public recognition of their value. The site is scheduled to begin with approximately 600 reports, all from companies that have agreed to make their reports public.
How Many Visitors? A Sample Nightmare.
One of the simplest measures that any site wants to know is how many different individuals - "unique visitors" in industry parlance - visit the site. FAST's draft standard on Metrics and Methodology suggests "three acceptable methods for identifying unique users: unique registration, unique cookies and unique IP address with heuristic." Of these, it suggests that unique registration is the best, "Sites that register visits should have no problem determining the page requests that belong to the same visitor. A site must use 100% registration in order to use this method validly."
Next is the use of unique cookies. If a unique cookie is dropped on every browser, the user can be uniquely identified even without any personal information. The third method calls for the use of IP addresses. However, IP addresses are only an approximate match to actual users. As FAST states, "It must be noted that IP addresses can and often do represent more than one user, so this measure does not necessarily represent the number of people reached. It should also be noted that dynamically assigned IP addresses impact the accuracy of this methodology."
Few websites require registration before showing any pages at all to a user, so the most practical way to track individuals uniquely is with cookies. (True some small percentage of users block cookies, but because there are so few they become largely irrelevant to the discussion). The standard doesn't say that a site has to drop cookies, only that if it doesn't it must have another way to count visitors.
If the site doesn't drop unique cookies then visitor calculations have to be done by making educated guesses based on IP address. Such guesses could take into account the time period between two pages served to the same IP address, the click trail as revealed in the referrer field, and other items. In the latter category are cookies that may be dropped by the web server without the website taking explicit action; Microsoft's IIS in particular can end up dropping quite a few. Any particular traffic program can use any of these means to count visitors, but there is no one best way to do so.
To make things worse, consider caching. When a surfer clicks on a link there is really no guarantee that your website will even see the request in its logs. The page may be cached someplace between your server and the user's browser - in the user's machine. Websites certainly want to claim views of cached pages or ads as part of their traffic, but by definition these can only be estimated. So, again, how can reliable numbers be generated?
How to prepare for an audit: Know Thy Traffic
It is absolutely important to understand two things about traffic numbers.
1. You won't get it 100% correct.
2. Politics
The first point is probably obvious. What with caching, proxy servers, the ability of users to block cookies, and other factors, there's no way to be perfect. Nor is that a problem. Given the variations between different audit styles, consistency and traceability are your best bets. If the numbers you report are only 5% or even 15% off from your first audit you'll probably set a huge round of applause from the auditors.
Second, we want to make sure that you understand that traffic numbers are as much a political matter as a technical one. The classic example is the problem that was faced by sales people of companies who, early in the evolution of Web advertising, wanted to be above-board in their use of traffic numbers. "We're reporting page impressions," one of those sales people told TEC years ago, "but we're competing with people who still report hits. And the customer doesn't understand the difference." While no advertiser is going to get caught buying hits instead of impressions today, until the sites of your competitors are audited there's no way to know how accurate the numbers they provide to advertisers are. So sales and marketing folk may have different levels of interest from the IT staff in pruning the numbers down to the absolute minimum.
Most sites use commercial traffic reporting products or services. While these are certainly appropriate for use on a regular basis, we recommend that sites expecting to be audited at some point come to an understanding of their traffic before relying too heavily and too long on such packages. The accuracy of the commercial offerings is limited by how well you can configure them to exclude page impressions that should not count for an audit. You can expect a package to automatically exclude images like .gif and .jpg files, and some come out-of-the-box ready to exclude the larger search engines, but they can't know the characteristics of your site.
In fact, from a traffic point of view you may not know the characteristics of your site until you create a small project to examine the logs in detail. It should take only a week or two of programmer time to write a program that can counts impressions, visitors and visits. The careful inspection of the logs and the derivation of algorithms you'll need to do this will put you on a firm footing both to configure your commercial log analysis software and to be prepared for a traffic audit.
Among the areas to pay attention to are:
* IP Addresses: Do you know which IP addresses people from the company (or from partners) will be recorded as coming from? Can you create an estimate of the number of people who actually come from such overloaded addresses as AOL's proxy servers to use your site?
* Cookies: TEC recommends the use of a unique cookie to identify visitors. However, you may discover that your server software has its own supply of cookies. Microsoft's SiteServer, through its various features, has the capability of dropping many cookies. These, unless you understand them carefully, may have the effect of confounding your traffic reporting software, probably leading to a significant over-counting of visitors.
* Usernames: If your site has registration the usernames can appear in the traffic logs and be quite helpful in validating your numbers. But if you don't require people to register immediately the same person may appear in the logs both with and without a username. This would have the effect of inflating your visitor count and decreasing the measure of the average time spent on the site per user.
* Caching: Having your static pages cached by remote servers or browsers helps reduce the load on your own servers and on the network as a whole, but at the cost of reducing your traffic counts. You can develop estimates of the degree to which this occurs by inserting directives into the HTML code that will have the effect of invalidating versions stored in caches, or by changing the modification dates to make those pages look new. The former approach gives a better estimate since it in theory causes every browser to reload the pages every time, while the latter approach merely causes reloads once by caching servers. Trials that mix both methods can lead to the best estimates of real traffic.
* Bots: There are lists of "known" robots and search engines published on the Internet, and some traffic packages routinely use these lists to remove unwanted impressions. However, these lists are not complete, and unknown robots regularly search your site and can cause significant spikes and consistent over-estimates of your traffic. The only protection here is eternal vigilance. While most robots identify themselves in the User-agent field of the log, many do not. One way to find such impolite robots is to look for users who visit a large number of pages in a short period of time. If you know which specialized search engines visit your site, you find out directly from them what IP addresses they use and set your software to ignore them.
It may be that you end up with more special cases than your commercial software can deal with. This will mean that its reports over- or under-estimate what you believe the accurate numbers to be. In that case a few data points should establish the nature of this difference. You can then adjust the numbers from the package before reporting them - making sure to revalidate the relationship periodically.
Reporting on advertising is a different matter. It is conceptually similar, in that you can specify some kinds of impression as ineligible for counting. Typically the ad server logs contain less information about individuals than do your traffic logs, so many of the opportunities for removing bogus impressions are not easily available. Since this is the same boat everyone else is in there should be no problem when it comes to an audit. And your ad serving vendor or service should be able to show you that their overall procedures have been certified by some auditing agency, and should be able to advise you about any particular circumstances on your site. However, it will be up to you, through analysis of your traffic, to discover special situations that might be necessary to account for in ad reports.
source
http://www.technologyevaluation.com/research/articles/traffic-audits-make-strange-bedfellows-part-ii-the-audit-process-16104/
0 comments:
Post a Comment