Skip to content


Visualizing Web Browsing Behavior

I am now brushing up on my python programming for my upcoming TA position. I’ve been playing with some interesting external packages such as numpy, scipy, networkx, and matplotlib.

Here is the first product of my few-days study of python visualizations.

The dataset used is called “MSNBC.com Anonymous Web Data Data Set“, which is a collection of web browsing log within MSNBC.com website in 1999. The pages within the site is classified as 17 different categories: frontpage, news, tech, local, opinion, on-air, misc, weather, msn-news, health, living, business, msn-sports, sports, summary, bbs, travel.

Dataset holds about one million records of tracking on each user. Every row looks like this.

1 1
2
3 2 2 4 2 2 2 3 3
5
1
6
1 1
6
6 7 7 7 6 6 8 8 8 8

The movie above is based on moving averages calculated as the program reads along the data file. The below is the visualization based on the entire dataset.

network_viz

(Nevermind the "# Visitor: ~" on the bottom-right corner.)

Quick-and-dirty findings are:

  • Frontpage rocks. (Of course they should.)
  • Basically, the whole structure looks like a hub structure. Strong links exist between frontpage and (news, business, on-air, sports, local, misc).
  • Middle-level link between sports and msn-sports is seen, but virtually no link between news and msn-news is observed.
  • Weather is isolated from other pages, but it induces a fair amount of clicks within itself.
  • Etc. Etc….

For those who cannot see the legend on the movie, I would repeat it here. (edge width, node size, node color) represents (# of users passed, # of users visited, self-loop ratio) respectively.

Although the code is so dirty that I do not want to upload here, if you want to look at it, please let me know by leaving a comment. Thanks.

Share and Enjoy:

  • TwitThis
  • email
  • PDF
  • del.icio.us
  • Google Bookmarks
  • Digg
  • Reddit
  • Slashdot
  • Facebook
  • Technorati
  • StumbleUpon
  • DZone

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Posted in Pet. Tagged with , , , , , , , , , , , , .

4 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. vey niceeeeee

  2. Hyunwoo Park said

    Thank you for visiting :)

  3. Lore said

    Hello!
    I found very interesting your website and thank you for doing in it in English..

  4. Hyunwoo Park said

    Thanks for visiting!

Some HTML is OK

(never shared)

or, reply to this post via trackback.