1, technical idealization
Collect each kind of data
has its unique technical advantages, but no one can capture the perfect way to collect visitors all the action on the site, each kind of technology will be due to the limitations of the cause you see the data is not perfect data. To calculate the page stay time, for example, the following figure is the time history of an access: (the time in the icon is the time to enter the page)
usually calculates the retention time of the page as follows: the entry time of the current page is the difference between the entry time of the current page and the entry time of the next page. Thus in this example page retention are as follows:
pages A:5 minutes
pages B:1 minutes
pages C:4 minutes
Why not stay
page D? Yes, no matter what kind of collection mode could capture page D accurate residence time, the reason is very simple, the data collection methods are unable to capture the visitors leave (or stay for a long time in the exit page no click, or closes the browser). Therefore, different tool manufacturers have different definitions of the time of stay on the exit page, and some are uniformly calculated for 1 minutes, some simply think it is 0 minutes.
currently has some of the following techniques or restrictions on data acquisition or obfuscation of existing data collection.
does not mean caching physical chips such as CPU, but rather to save network resources, speed up Web browsing, build browser caching, or proxy caching. A simple understanding of these two caches is to store the content of your web pages (including pictures and cookie files) on a computer or proxy server. When you call a previously read page, you can directly transfer the contents of the cache without the need to retransmit data again from the web server.
below is the file record left in the local cache folder after accessing a web site:
because when a visitor accesses a web site through a local cache, it does not send a request to the web server, and naturally there is no Log record of the access. That is to say, the data collected through the Web log will definitely lose this part of the traffic.
if you want to clarify the principle and algorithm of the search engine crawlers will open a single section are not enough, and it is not the subject of this book, so will not repeat them here.
first gives a search engine crawler record in web server Log: >