mperfect web analytics data Data idealization and visitor idealization

1, technical idealization

Collect each kind of data

has its unique technical advantages, but no one can capture the perfect way to collect visitors all the action on the site, each kind of technology will be due to the limitations of the cause you see the data is not perfect data. To calculate the page stay time, for example, the following figure is the time history of an access: (the time in the icon is the time to enter the page)


usually calculates the retention time of the page as follows: the entry time of the current page is the difference between the entry time of the current page and the entry time of the next page. Thus in this example page retention are as follows:

pages A:5 minutes

pages B:1 minutes

pages C:4 minutes

page D:?

Why not stay

page D? Yes, no matter what kind of collection mode could capture page D accurate residence time, the reason is very simple, the data collection methods are unable to capture the visitors leave (or stay for a long time in the exit page no click, or closes the browser). Therefore, different tool manufacturers have different definitions of the time of stay on the exit page, and some are uniformly calculated for 1 minutes, some simply think it is 0 minutes.

currently has some of the following techniques or restrictions on data acquisition or obfuscation of existing data collection.

1. cache

does not mean caching physical chips such as CPU, but rather to save network resources, speed up Web browsing, build browser caching, or proxy caching. A simple understanding of these two caches is to store the content of your web pages (including pictures and cookie files) on a computer or proxy server. When you call a previously read page, you can directly transfer the contents of the cache without the need to retransmit data again from the web server.

below is the file record left in the local cache folder after accessing a web site:


because when a visitor accesses a web site through a local cache, it does not send a request to the web server, and naturally there is no Log record of the access. That is to say, the data collected through the Web log will definitely lose this part of the traffic.

2. crawler

if you want to clarify the principle and algorithm of the search engine crawlers will open a single section are not enough, and it is not the subject of this book, so will not repeat them here.

first gives a search engine crawler record in web server Log: >

Leave a Reply

Your email address will not be published. Required fields are marked *