Ubuntu and Privacy and how it really works now.
There have been quite a few entertaining discussions on the interwebs about Ubuntu and concerns around privacy. This topic comes and goes on a regular basis, today it has come up because Mozilla are planning on putting some fairly harmless adverts on the blank tiles of new tabs and this is being compared to the Dash search in Ubuntu. Whenever the topic is raised it tends to be a fairly heated discussion, mostly focussing on the Amazon search results in the dash, mostly calling that adverts or spyware. It is a discussion that is mostly overblown and underinformed, with so much time spent freaking out about “adverts” that the real problems have been completely missed. Lets go through a bit of history, and I will try and explain the difference between the real problems and the FUD.
Initially there was the Gnome 2 application launcher, kinda similar to the Windows start button, it is a way to run applications that you have on your computer. They are nicely categorised so you can find all the graphics related applications on your computer and see Inkscape alongside Gimp and choose what you want to run. This worked well and people were generally satisfied at this mechanism for running local applications. Then along came Unity, this introduced the launcher, a dock bar on the left that shows running applications and has the ability to pin applications so you can start them by clicking on them when they are not running. The launcher is the way to run applications that you have on your computer – but not all of them, and not categorised, just your favourite ones you have pinned to the launcher. Unity also introduced the dash. This has a different scope of functionality, I like to call it the OmniGlobalEverywhere search tool. You type stuff in and it searches in lots of places to find what it is you are looking for. This is not the same scope of functionality as the Gnome 2 application launcher, it could search for local files, videos on YouTube and other streaming services, music, photos, other things. It is an extensible search interface and you can plug in additional search things. I wrote an OpenERP plugin so I could type an invoice number and jump straight to that invoice in a browser for example. It was a pretty cool concept as a jack of all trades search interface – but it isn’t the master of the specialised job of viewing and running applications you have already got installed.
Everyone completely missed the fact that the magic privacy button for a long time did almost nothing – it was just an undocumented flag that some lenses looked at and turned themselves off. Others did not. This was a real big deal and nobody noticed because they were obsessed with calling Amazon search results adverts. Now we have all kinds of odd lenses and search queries possibly going to yelp, zotero, yahoo finance, songster, songkick, gallica, europeana, etsy, COLORlovers and other places. Have you even heard of every single one of these? Do you know they are not evil? Do you know they are financially stable enough not to close the doors and let the domain renewal lapse for someone evil to buy it? Amazon I know and trust to continue existing, I also trust them not to want searches for partial mostly irrelevant words for profiling data when they have my product purchase history. The utter junk that the dash sends is of no value to Amazon compared to everything else they have, but this doesn’t stop people banging on about that one specific, relatively harmless and pointless in equal measure lense.
Firstly the Amazon lens is nothing special, and it is perhaps the internet connected lens I am least worried about. I trust Amazon to do what I expect them to do, I am a customer so they know what I bought, sending them random strings like “calcul” and “gedi” and “eclip” does not give them valuable data. It is junk. I am much more concerned about stuff like the Europeana, jstor, grooveshark lenses which do exactly the same thing but I have no idea who those organisations are or what they do. Even things like openweathermap, sounds good, but are they really a trusted organisation?
So, back to how it works. Your query for “socks” goes to products.ubuntu.com. At that point canonical’s secret sauce server looks at your query and decides that most people who search for socks either want to know about products to buy, or applications to run. They don’t tend to click on the results from the medicines or recipes lenses when we try showing those lenses to the user. So, having decided that the shopping lens and the applications lens are reasonable ones to search in it sends the query to Amazon (being the only shop currently supported, but it is designed to support every online sock vendor in the world) and tells your computer that the applications lens is worth looking in. When it gets the results back from Amazon those go to your computer, as a bunch of json data that is very similar to the Amazon json API, Amazon at this point thinks that Canonical’s server has got cold toes and is in need of some nice warm socks. Amazon does not know you exist at this stage.
That bundle of sock related data goes to the shopping lens on your computer, which then displays the results. It does this by showing some text “stripy socks, only £5.30″ and a picture, which it used to retrieve from Amazons content distribution network – O.M.G.!!! a data privacy leak. Amazon could log hits to their CDN (which I doubt they do), consolidate them globally, and figure out that it was displaying a bunch of sock pictures requested by your IP address, shortly after Canonical’s server searched for socks, so they could theoretically tie this together and infer that the reason you are staring at sock pictures is because you searched for socks via the dash search tool. So this huge and seriously concerning data privacy breach was a problem, so they fixed it. Now when you search for socks, Amazon gets CDN requests for images from products.ubuntu.com. Your computer gets the images from products.ubuntu.com (over https rather than http), it is now basically a reverse proxy for Amazon images, so that amazon is now more convinced than ever that Canonical’s server has got cold toes. As it happens, there is nothing wrong with your toes and you actually wanted to configure a socks proxy all along, and the shopping thing was a pointless overhead because when you want new socks the dash isn’t where you dash to.
There is a conversation on the technical board mailing list here https://lists.ubuntu.com/archives/technical-board/2013-October/thread.html and here https://lists.ubuntu.com/archives/technical-board/2013-November/thread.html relating to the closedness of the server side app. Having written something a bit similar myself, mine was closed for a while because it contained the Amazon API oauth keys in the source code. There really isn’t much to it on the server side. My server code is here https://github.com/AlanBell/shopping-search-provider/blob/master/server/index.php