dear wikileaks webhamsters, would you please stop inflating every single damn cable with all the navigational fluff? it's all empty carbs, you know, and makes mirroring you a FPITA.

if you downloaded the latest cablegate snapshot via bittorrent you might be surprised that it's only about 300 megabytes (for 150k cables) - but you WILL be unpleasantly surprised when you unpack that archive: it expands (like kudzu) to a whopping 30+ gigabytes.

the reason: every single cable page is infested with links everywhere else and then some, so that the actual content is just about 3%.

take this cable for example: after ripping out all the gunk between <div class='pane small'> (might have called the css class 'pain, big') and the closing </div> after 'courage is contagious', the page shrinks from a whopping 168k to just 6k. the cleaned page thus consumes only 3.6% of the disk space and network bandwidth of the fugly original beast (and don't forget there's 150 thousand such occurrences right now).

nobody, dear leakers, is going to mirror 30+g of you strutting your fluff on a daily basis - but a decent, still navigatable 7g would be a very different story.

[ published on Mon 29.08.2011 14:57 | filed in interests/anti | ]
Debian Silver Server
© Alexander Zangerl