Ubuntu and Privacy and how it really works now.

There have been quite a few entertaining discussions on the interwebs about Ubuntu and concerns around privacy. This topic comes and goes on a regular basis, today it has come up because Mozilla are planning on putting some fairly harmless adverts on the blank tiles of new tabs and this is being compared to the Dash search in Ubuntu. Whenever the topic is raised it tends to be a fairly heated discussion, mostly focussing on the Amazon search results in the dash, mostly calling that adverts or spyware. It is a discussion that is mostly overblown and underinformed, with so much time spent freaking out about “adverts” that the real problems have been completely missed. Lets go through a bit of history, and I will try and explain the difference between the real problems and the FUD.

Initially there was the Gnome 2 application launcher, kinda similar to the Windows start button, it is a way to run applications that you have on your computer. They are nicely categorised so you can find all the graphics related applications on your computer and see Inkscape alongside Gimp and choose what you want to run. This worked well and people were generally satisfied at this mechanism for running local applications. Then along came Unity, this introduced the launcher, a dock bar on the left that shows running applications and has the ability to pin applications so you can start them by clicking on them when they are not running. The launcher is the way to run applications that you have on your computer – but not all of them, and not categorised, just your favourite ones you have pinned to the launcher. Unity also introduced the dash. This has a different scope of functionality, I like to call it the OmniGlobalEverywhere search tool. You type stuff in and it searches in lots of places to find what it is you are looking for. This is not the same scope of functionality as the Gnome 2 application launcher, it could search for local files, videos on YouTube and other streaming services, music, photos, other things. It is an extensible search interface and you can plug in additional search things. I wrote an OpenERP plugin so I could type an invoice number and jump straight to that invoice in a browser for example. It was a pretty cool concept as a jack of all trades search interface – but it isn’t the master of the specialised job of viewing and running applications you have already got installed.

Everyone completely missed the fact that the magic privacy button for a long time did almost nothing – it was just an undocumented flag that some lenses looked at and turned themselves off. Others did not. This was a real big deal and nobody noticed because they were obsessed with calling Amazon search results adverts. Now we have all kinds of odd lenses and search queries possibly going to yelp, zotero, yahoo finance, songster, songkick, gallica, europeana, etsy, COLORlovers and other places. Have you even heard of every single one of these? Do you know they are not evil? Do you know they are financially stable enough not to close the doors and let the domain renewal lapse for someone evil to buy it? Amazon I know and trust to continue existing, I also trust them not to want searches for partial mostly irrelevant words for profiling data when they have my product purchase history. The utter junk that the dash sends is of no value to Amazon compared to everything else they have, but this doesn’t stop people banging on about that one specific, relatively harmless and pointless in equal measure lense.

Firstly the Amazon lens is nothing special, and it is perhaps the internet connected lens I am least worried about. I trust Amazon to do what I expect them to do, I am a customer so they know what I bought, sending them random strings like “calcul” and “gedi” and “eclip” does not give them valuable data. It is junk. I am much more concerned about stuff like the Europeana, jstor, grooveshark lenses which do exactly the same thing but I have no idea who those organisations are or what they do. Even things like openweathermap, sounds good, but are they really a trusted organisation?
So, back to how it works. Your query for “socks” goes to products.ubuntu.com. At that point canonical’s secret sauce server looks at your query and decides that most people who search for socks either want to know about products to buy, or applications to run. They don’t tend to click on the results from the medicines or recipes lenses when we try showing those lenses to the user. So, having decided that the shopping lens and the applications lens are reasonable ones to search in it sends the query to Amazon (being the only shop currently supported, but it is designed to support every online sock vendor in the world) and tells your computer that the applications lens is worth looking in. When it gets the results back from Amazon those go to your computer, as a bunch of json data that is very similar to the Amazon json API, Amazon at this point thinks that Canonical’s server has got cold toes and is in need of some nice warm socks. Amazon does not know you exist at this stage.

That bundle of sock related data goes to the shopping lens on your computer, which then displays the results. It does this by showing some text “stripy socks, only £5.30″ and a picture, which it used to retrieve from Amazons content distribution network – O.M.G.!!! a data privacy leak. Amazon could log hits to their CDN (which I doubt they do), consolidate them globally, and figure out that it was displaying a bunch of sock pictures requested by your IP address, shortly after Canonical’s server searched for socks, so they could theoretically tie this together and infer that the reason you are staring at sock pictures is because you searched for socks via the dash search tool. So this huge and seriously concerning data privacy breach was a problem, so they fixed it. Now when you search for socks, Amazon gets CDN requests for images from products.ubuntu.com. Your computer gets the images from products.ubuntu.com (over https rather than http), it is now basically a reverse proxy for Amazon images, so that amazon is now more convinced than ever that Canonical’s server has got cold toes. As it happens, there is nothing wrong with your toes and you actually wanted to configure a socks proxy all along, and the shopping thing was a pointless overhead because when you want new socks the dash isn’t where you dash to.

There is a conversation on the technical board mailing list here https://lists.ubuntu.com/archives/technical-board/2013-October/thread.html and here https://lists.ubuntu.com/archives/technical-board/2013-November/thread.html relating to the closedness of the server side app. Having written something a bit similar myself, mine was closed for a while because it contained the Amazon API oauth keys in the source code. There really isn’t much to it on the server side. My server code is here https://github.com/AlanBell/shopping-search-provider/blob/master/server/index.php

18 Comments

  • Jason says:

    I trust Amazon more than I trust Canonical. So Canonical “will” know that I am looking for socks, even if I am looking for it on my hard drive? Why? So why should Canonical know that I am searching for “sex toys”? Is Canonical saving this data? Is Canonical complying with GCHQ? Amazon won’t go belly up, Canonical can, will Canonical sell this data? Where does it state on policy page that a user can’t be identified by Canonical? Canonical can very easily with time-stamp and IP (and later U1 account which it needs to install apps from Ubuntu Store) pin-point users. Where does it say that Canonical won’t store this data an won’t connect it to my identity? The Policy page clearly says that they will store data and may identify user and will share it with other partners. You talked about everything except for THE thing that matters here. So will you address these points actually?

    • Alan Bell says:

      yeah, those are good points, but not the points that generally get raised in these discussions, which is why I didn’t address that specifically. Canonical have stated that they are only storing the data in aggregate, in order to feed the smart scopes server so that they can give your dash hints about the best lenses to search in for that query. It actually replies to the smart scopes server with the lens that was most useful, so if you search for sex toys and it shows you results from Amazon and Anne Summers but you click on a result found via the Anne Summers lens then your computer will inform the smart search server that the Anne Summers lens produced a productive result (but not what the result was) so that in future searches for sex toys are more likely to prioritise the Anne Summers lens over others. This whole thing isn’t a feature that I am in any way enthusiastic about, I am just hoping to provide a bit more accuracy to the conversation.
      If you don’t trust Canonical then there is absolutely no way that you should be running Ubuntu, (or any derivative that uses the repositories, such as Mint) and certainly no way that you should ever update it, the update process downloads scripts from ubuntu.com and executes them with root privileges, so Canonical can do anything they like with root access to your computer if you do an update.

      • Jason says:

        Ok. So you trust an airline pilot with your life that he may fly you safely that doesn’t mean he also gets access to your bank account, can read your emails, get password to your PayPal and such? Nopes. Similarly Ubuntu is trusted ONLY to run the OS safely – it beings and ends there. Running Ubuntu doesn’t mean they get ownership of me. That’s what you are implying by the ‘they own root’ comment. So what you are saying is that when we fly we must hand over everything to the pilot. No, sir. That’s not how it works.

        • Jo-Erlend Schinstad says:

          Pretty much every day, I enter “po”. Why? I’ll give you the same hint I give Canonical: it’s something local. Does that help much? Let me tell you: sometimes it’s PokerTH, other times, I’m translating something. Is this the same as complete access to everything? No, it isn’t.

          I don’t know what makes you compare something like that to access to a bank account.

    • Jo-Erlend Schinstad says:

      The protocol doesn’t allow for identities to be used. Every time you open a dash window (The Dash), you generate a UUID which is sent to the server with the query. This is also used when you click on something. If you click on something local, then it is only known that you did “something local”, not what.

      It is good that the service is designed this way, because it allows for anonymity. You could, for instance, use Tor to access the service. Then, each search would come from a different place and there would be no way to even guess your identity.

  • Mack says:

    That makes no sense. If Canonical did in fact do something malicious with their packages or updates without user consent, they’d be breaking several laws (and be easily caught). People can easily trust them not to do that, as it is reasonable to assume that Canonical employees don’t want jail time, or that the company doesn’t want to pay damages.

    However, people have no reason to trust Canonical not to sell user private data when allowed by their privacy policy (if “accepted” by the user).

    • Alan Bell says:

      you either trust them or not in my opinion. The bottom line for me is that the data they get is junk. “gedi” and “term” are not valuable insights into my life.

      • Frederik Elwert says:

        I don’t buy the “you trust them already” line. The NSA scandal showed one big issue: Intelligence services can request data from service providers, and they can prevent the providers from admitting that. So any statement from Canonical about “We won’t give away your data” is, in fact, worthless. And this is not about trusting Canonical or presume they are malevolent. It’s just about not trusting gov agencies and believing that Canonical is as powerless against a secret court decisions as we all are. It’s unfortunate that this new dimension of privacy issues came up after Canonical introduced their service, but this is the world we now live in. Google and Microsoft have to restore trust, and so has Canonical.

        And regarding worthless data: Sure, “gedi” is not particularly telling. But once you take file search into account, it becomes much more interesting. And now add contacts search on top of that, and then you might see where this leads to. Currently, the dash doesn’t really work as a one-stop search solution. But once it gets there, there will be relevant data.

        • Alan Bell says:

          yes, that is a fair point, if the dash was my go-to solution for searching for everything then it might be mildly more concerning. Canonical would still have less useful data than Amazon or Google or Facebook already have, and quite frankly I do trust everyone I know at Canonical (including Mark Shuttleworth) so I am not personally concerned about it. I do understand other people having a different opinion though. I don’t think there is any information about me I could potentially reveal through my searching that I wouldn’t cheerfully tell Mark directly over a few beers. Is this stuff I want “europeana” and “gallica” to know about? I have no absolutely no clue who they are and that is a significantly more of a worrying trust issue to me than Canonical or Amazon.

          • John Smith says:

            I don’t think who we trust or not with whatever data is the most important matter here, I think it really is the fact that the community has no way whatsoever to evaluate or improve the code for the server, and hence no one can verify if it is doing what it should be doing. Moreover, I’m not sure if this is (allegedly) the case already, but no IPs should be logged, and the data should be as abstract as possible. If we had this, we’d have a service similar to DuckDuckGo or IxQuick but completely open.

  • Guest says:

    It is nice to see straightforward explanations and it is nice to see progress made in a way Canonical is proxy for the (encrypted) data. When user chooses to use the service it is great to know there still is some anonymity and more or less Canonical knows all and user can choose if the trust towards Canonical is there or not.

    For privacy concerned users it is nice to have the button to turn ON/Off this and not to send any data but in the past i did not like the fact switching the button OFF still send some data (to Canonical). If i remember correctly it was missing feature or a bug back then and i do not know if it was fixed or not. There really is no excuse to send anything remotely if some switch to control this is set to OFF.

    All that said there still are concerns some face when thinking should i recommend Ubuntu to my friends/family?

    In the past this was no brainier the answer was always YES now i must say i feel more comfortable recommending KDE (Kubuntu). Because of the fact privacy is not there anymore by default and each Ubuntu version evolves and it is hard to follow all the settings and explain what to turn on and off and why. In the past this was not the case i knew installing Ubuntu did not violate privacy in any meaningful form. I hope someday i will be able to recommend Ubuntu again just like in the past it would probably happen if all of this would be turned OFF by default if it can go completely.

    • Alan Bell says:

      The privacy flag kind of does work now. Lenses should declare whether they are remote searching lenses or local lenses. Remote searching lenses do not get given the query if the privacy flag is set. There is an unfixed bug here https://bugs.launchpad.net/ubuntu/+source/libunity/+bug/1250134 which is that if a lens does not declare whether or not it does remote searches then it is assumed to be a local lens. If that bug is ever fixed according to my recommendations then I would consider the privacy flag to be working. You can also turn off lenses individually by right clicking them in the apps lens in the dash (I have no clue why it is done there)

  • Jef Spaleta says:

    The fact that the dash interface neither provides an option to enforce the use of encrypted comms nor informs you of scope unencrypted network comms is also a problem.

    You may trust specific vendors out there for specific queries, but do you trust everyone sitting on the network between you and those vendors? The fact that the dash has network active search on by default, which can leak all your dash queries unprotected into a hostile network is a boon to anyone maliciously sitting and packet sniffing on the network that neither you nor your search providers control. Don’t want your government sifting through your dash queries? Better make sure scopes don’t use insecure network comms. The dash design doesn’t really help users make informed decisions about that and just assumes queries are not leaking private information.

    And the new security layer meant to sandbox scopes isn’t going to help with this either. It’s not designed to think of the network as potentially malicious. A bit of an oversight really, concerning how much end-users can bounce across wi-fi network boundaries nowadays.
    -jef

    • Alan Bell says:

      I haven’t seen anything about sandboxing scopes, is that something in the Ubuntu Touch apparmor stuff? Basically each scope runs as a process that is allowed to do pretty much anything that a process running under your uid can do, so it can get stuff from the internet via http or https, read files you can read, write data to places you can write etc. A malicious scope could read and upload any file your user can read to a server out on the net, but so could any other program running in your session. This is all interesting stuff which deserves thinking about, I just feel a lot of thinking time has been lost due to excessive focus exclusively on the Amazon lens.

      • Jef Spaleta says:

        yep is something in the touch apparmor stuff and click packaging policy which accounts for scopes to be dropped in the appstore as unreviewed applications.. distinct from the regular class of application click packages.

        Regular sandboxed apps I believe are going to have their network access locked down until users permit it.

        Scope..and I could be wrong on the details…are going to have network access allowed by default.. but wont be able to do get access to your files to be able to upload them(to counter your example) without permission from the user. But the insecure network access by default will be allowed under the assumption that if you installed it, then obviously you want it to be active by default in the search queries.

        Anyways, I’ve sketched out what I think is a technical solution to the problem, with some Canonical peeps in lwn comments when I brought this up originally. Its possible for Canonical to provide a secure proxy server for insecure http connections for search providers who do not provide secure connections already. Users would have to trust Canonical…which is no worse a security stance than the default-to-on smart scope situation.

        And clearly, Canonical got the message about needing https… as soon as people saw their smart scope/amazon lens was doing insecure comms that got fixed real quick. Now the dash platform needs to fix that for all scopes. A secure proxy for https comms does the trick.

        Of course, I’m obligated to mention I still think the default-to-on for any network active search is a privacy problem. A privacy problem that Canonical is going to get punched in the mouth about repeatedly from privacy safeguard groups until they mitigate that design desision somehow. But my objection to default-to-on network active search, is orthogonal to my desire to see protection against insecure comms on a malicious network.

        Just be glad I’m not a Utouch user yet, and I’m just an external observer. If for example I saw this sort of thing happening on my wife’s smartphone.. I would feel far more compelled to make a campaign of this, instead of trying to poke people in the eye when it comes up naturally for discussion.

      • Jef Spaleta says:

        Oh I should say I agree with you about the time lost thing.
        I’m on record, early in the omg! discussion, in response to Jono I believe, as saying I think its unfortunate that the amazon lens came first… and in such a sensational fashion. A lot of the important issues on how the platform works, could have been better resolved if it were just say a wikipedia lens first.

        • Alan Bell says:

          I thought the video and music searches preceded it by some margin (which do precisely the same thing as the Amazon one, complete with affiliate codes) along with other lenses (fairly sure I wrote a network checking business documents lens in April 2012, prior to the Amazon one)

      • Jef Spaleta says:

        ping…
        Just FYI, the new write up at developer.ubuntu.com has some more information about the apparmor policy for the scopes in the “scope store” and how its going to be used to provide protections against malicious and/or buggy scopes that I alluded to in previous comment as “sandboxing.”

        http://developer.ubuntu.com/2014/02/introducing-our-new-scopes-technology

Leave a Reply

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>