Constant Crawl Design – Part 3

Suppose you were building a tool integrated with web browsers to anonymously capture the (public) websites that a user visited and store them to a P2P network shared by the users of this tool. What would the requirements be for this storage P2P network? [More...]

Constant Crawl Design – Part 2

Suppose you were building a tool for anonymously capture the (public) websites that a user visited. What would the UI requirements be? [More...]

Constant Crawl Design – Part 1

Do you remember Google Web Accelerator? The idea was that you downloaded all your pages through Google’s servers. For content that was static, Google could just load it once, then cache it and serve up the same page to every user. The advantage to the user was that they got the page faster, and more reliably; the advantage to Google was that they got to crawl the web “as the user sees it” instead of just what Googlebot gets… and that they got to see every single page you viewed, thus feeding even more into the giant maw of information that is Google.

Well, Google eventually dropped Google Web Accelerator (I wonder why?), but the idea is interesting. Suppose you wanted to build a similar tool that would capture the web viewing experience of thousands of users (or more). For users it could provide a reliable source for sites that go down or that get hit with the “slashdot” effect. For the Internet Archive or someone a smaller search engine like Duck Duck Go, it would provide a means of performing a massive web crawl. For someone like the EFF or human-rights groups it would provide a way to monitor whether some users (such as those in China) are being “secretly” served different content. But unlike Google Web Accelerator, a community-driven project would have to solve one very hard problem: how do this while keeping the user’s browsing history secret — the exact opposite of what Google’s project did. [More...]

Host Error 2

Another posting on how to understand Profile errors. [More...]

Removing the “Macros” warning in PowerPoint

When you open any PowerPoint presentation made by my company’s default presentation format, you get a warning that it contains macros and asking whether the macros should be disabled. The macros are useless, but removing this is somewhat awkward and difficult to remember so I’m writing down the instructions. [More...]

Using a Mix of Computers and Humans for Security

Suppose that your bank offers currency conversion as a service: give them a deposit or make a withdrawal in euros and they’ll adjust your balance in dollars. They don’t do this out of the goodness of their hearts: today’s conversion rate is around 1.28 $ / €, so they’d give you 0.75 € for every $ and 1.25 $ for every € so they’d make a good 6.5% margin on the conversions. [More...]

Namespace for a valid SOAP message

A brief hint: if you see an error message like this:

InputStream does not represent a valid SOAP 1.1 Message

check the namespace of the SOAP envelope

SOAP 1.1:

SOAP 1.2:

Binary Backward Compatibility

I saw this interesting article about a weakness in the Scala language. The weakness applies not just to Scala, but to pretty much any language: the community using the language cannot grow past a certain point until it somehow solves the problem of libraries depending on other libraries in a large (deep) tree. [More...]

Election Guide for Nov 8, 2011

My election guide for November 8, 2011. [More...]

Story Points

If you have complete and accurate requirements for your project which won’t change, and your development team is spot-on in estimating and highly consistent in their development pace. and there are no surprises, then you can produce highly accurate project timeline estimates up front. Such accurate estimates are (or, more accurately, would be) quite useful and well worth the effort it takes to produce them because of how nicely you can schedule everything. But how about the rest of us, for which none of this is true? [More...]