How Data Correlation Works

How Data Correlation Works

Dominik Bärlocher
by Dominik Bärlocher
time to read: 17 minutes

It has been revealed that Android devices and iPhones track their users by default. This is not some attempt at spying on users, but an attempt at getting as much data as possible by crowdsourcing information. It’s a prime example of data correlation and how the users aren’t really the targets of this massive data gathering operation.

Google and Apple know where you are.

Hardly a day goes by without some sort of scandal that somehow relates to privacy of data or general information security subjects. From a professional perspective, it gets tiring after a while and the general attitude from the pros seems to be It’s not that bad. In fact, how is this even an issue?

What professionals don’t realize is that not everyone shares their knowledge of the inner workings of computer systems and what they can do and how they work. Sure, there’s some abstract understanding that a person can be tracked using their mobile phone because Horatio Caine on CSI: Miami does it every day on TV, but most people would never guess that it can happen to them.

Thus, it’s not much of surprise that people are surprised at the fact that our iPhones and our Android devices track our every move. What most people don’t realize, is that they only do it when users let them. It’s not some sort of inherent evil of Google and Apple that lets the two companies record our every step. This view is what causes people to be all up in arms and this is what causes professionals to shake their heads and recommend people to remain calm, leading to some unfortunate stalemate between two factions that get increasingly adamant and unshakeable in their opinion.

This is why this article was written. This Labs seeks to fill the gap between what a pro sees and what a consumer doesn’t know using simple and somewhat simplified but accurate examples that illustrate the processes of data correlation and what users are telling the companies without necessarily wanting them.

Why it Works

All of this wouldn’t be the scandal it is if people would be more curious. Because, usually, the process of buying a phone goes like this:

Consumers just assume that companies have the same interests as them while reflecting neither on their own nor the business’ interests.

A company’s interests can look like this:

For the purpose of brevity and legibility, Google will be used as the example company for the remainder of this article. Apple’s data correlation works similarly. As do their products.

In case of the maps that record where you are, both companies claim that it’s to improve location settings and traffic reports. Obviously, they need to know where the users are for that to work. But why stop there? Both companies pride themselves on wanting to deliver the best possible information to their users. Google integrates cards with time and location based suggestions of what users might want into their Google Search App which integrates a thing called Google Now. Right now, my cards ask me whether or not I’m interested in Cricket based on a previous search and another card tells me that the latest issue of the scip monthly security summary is online. These cards would show me more if I had my Location Data enabled. I would see the weather, where the traffic jams are and I’m fairly certain that there’s at least the theoretical possibility that it would show me the daily special of the restaurant I frequently go to.

All this is – depending on how much value customers put on their privacy – a win/win situation. While customers get the daily specials of their favourite restaurants at around lunchtime when the choice of food becomes relevant, Google gets a lot of data out of it. More on the privacy issue later, though.

Data Correlation: How They Know Where What Is

But as much as Google knows, they can’t be everywhere at all times. They don’t have the infrastructure in place to survey neuralgic traffic nodes at all times. Yet Google Maps knows exactly where there’s bound to be a traffic jam. If you want to try that yourself, here’s how:

  1. Go to Google Maps
  2. Calculate a route from your office to your home
  3. Select the car as your means of transport
  4. Look at the red and blue sections of the route.
    • Red sections are where the traffic jams are
    • Blue sections are where the road’s not blocked

It does match up, doesn’t it? Well, more or less, at least. The question is: How do they know? Have they driven your route to work? This is where the magic of data correlation comes in.

Before we can start figuring out how they know where the traffic jams are, here’s what Google already has at their disposal:

Thus, given absolutely no cars on the road, Google would know how to drive from any place to any place with only a very small margin of error. They could calculate the fastest way, the nearest way, the way with the least change in altitude and a great many other things. However, traffic jams fluctuate. They appear at around 7.30am and disappear again at around 8.00am for example.

That’s why Google tracks motion of people down to the second. From a data point of perspective, this is important. Why?

Time Event Data Correlation
07.00 User wakes up After not moving for a few hours during the night, the phone moves.
07.30 User leaves for work by car Phone leaves house, movement becomes faster
07.35 User drives on Fastest Route to work GPS data
07.45 User avoids traffic jam due to local knowledge Just before a major intersection, the user leaves the Fastest Route and takes a minor road that leads around the major traffic artery
08.00 User arrives at work Phone is at the same place it is at this time during the day.

If the 07.45 event becomes a regular occurrence for a significant number of people who drive that route frequently, then a quick look at an archive of traffic reports will confirm that there’s usually a traffic jam at that major intersection at around 07.45.

However, a user travelling the same route at around 09.00 who drives straight through the intersection will not take the detour, because the traffic jam is over. Again, archived traffic reports will confirm the pattern.

The same goes for figuring out a user’s favourite restaurant.

Time Event Data Correlation
12.00 User leaves office Phone leaves workplace
12.10 User walked to restaurant Phone moved slowly for ten minutes, then stays stationary at a place where the phonebook, Yelp and Google Plus have a restaurant
12.40 User leaves restaurant, goes back to work Phone moves back to work at walking pace

If the user goes to the restaurant often, then it’s to be assumed that the user likes the restaurant. Thus, all users in the area might get suggestions for that restaurant.

The method of Place plus Time plus Local Knowledge can be applied to pretty much everything. They’re three factors that make data correlation really easy. Someone who works in a certain area and goes to lunch every day will eventually know where the good spots are. So getting that person’s knowledge is a really good idea to get other people good food. Someone who drives the same route every day will know where the best ways around traffic jams are.

To summarize all this: Google doesn’t actually care where you are. Neither does Apple, just to mention their fiercest competitor once again. They do want your knowledge. They want to know the best way around traffic hot spots, they want to know where you get your pizza because nobody likes bad pizza.

A Matter of Trust

The inevitable question now is: Can we trust them? The answer to this question is a highly individual one and should not be made for anyone. It is up to every single user to decide whether or not he or she wants to share the location of the best pizzeria in town or the secret way around a major traffic artery.

However, users need to be aware that there’s more information attached than just the pizzeria’s address. Only looking at the location data, users give the companies these datasets:

Dataset Correlation
Location of home Phone stays there overnight more often than in any other place.
Location of work Phone stays there during typical work hours.
Means of transport Cars only stop when there’s a traffic jam or a red light.
Public transport stops at regular intervals and takes the occasional detour off of the most sensible route.
Cyclists ride on footpaths and off streets at greater speed.
Pedestrians usually move a lot slower than any other means of transport
Name of frequently visited places GPS data has the phone at the same locations frequently, correlation with addresses located at that address and times of day when the phone was there prove the nature of the establishment.

The home address is not really that important or interesting from a business perspective. Of course, companies can figure out how many people live in a certain place to an extent, going by the number of smartphones per building, but that is very vague if users live in an apartment complex.

The location of work is more interesting, if you add it to the demographics of the place of residence. It tells a lot about a person, namely their income. This suddenly makes the location of work interesting in terms of targeted advertising.

Phone Data Correlation
Phone stays stationary in the same place Office, medical sector, retail, manufacturing etc
Phone goes to one central location, then to various ones Builders on construction sites, journalists, delivery people, etc.
Phone constantly on the move, only stops at certain locations every now and then Drivers, airline personnel, police, door-to-door salespeople, etc.

However, users also need to see that this is not something that humans at Google do. There are no massive buildings where thousands of people sit in cubicles and look at location data of all the Android devices. As of September 2013, there are one billion active Android devices. Even if only half of them submit their location data to Google, once every second, that adds up to 43’200’000’000’000 datapoints per day. That’s forty-three-point-two trillion. Apple has sold 500 million iPhones, which – assuming that half of them have location data disabled and all iPhones are still in use – adds up to 21.6 trillion datapoints per day.

Thus, there are big data computer systems in place that automatically capture location data, enter it into the relevant systems such as Google Maps and analyse it to figure out where the traffic jams is, using the method to distinguish between the means of transport mentioned above. The data correlation happens automatically. The people at Google have figured all this out before and tuned their infrastructure so that it does all this correlation automatically. So theoretically, you could go years without a human being at Google looking at your data.

However, that is not to say that it couldn’t happen that a person looks at a specific user’s data. Pretty much every Android device is tied to a Google account. This account is very probably tied to a credit card, which – regardless of whether a username of the account is fake – is tied to a real name. Whether or not a party or parties interested are willing and able to access that is anyone’s guess but in the realm of attack scenarios, this one is quite the realistic one. It’s a straight line between location data and identity.

So the decision left to every user is this: Do I want to give up part of my privacy so that other people get better service? The answer will vary from person to person. Some will think that the benefits of getting other users’ recommendations without having to do anything isn’t such a bad trade-off and they’ll readily do the same. Others think that their privacy is worth more than just an operating system on a device that won’t last more than three years and will disable it.

How to Opt Out and Be Curious

Of course users can opt out of this but this requires the one thing that so very few people do. It requires the users to dig around their settings. The settings to disable Location Services are not cleverly hidden, but they’re not made to jump into the users’ faces.

For Android users:

  1. Go to Settings
  2. Scroll down to Location and tap on it.
  3. Uncheck all the boxes in there.

For iOS users:

  1. Go to Settings
  2. Scroll down to Privacy
  3. Tap on Location Settings
  4. Move the slider to the Off position

This is an ideal opportunity to discover all the features of a phone. There are a lot of other settings around and it’s virtually impossible to break a phone to the point where it doesn’t work anymore. So look around your phone. Dig into the menus and settings. You’ll be amazed at what you discover your phone can actually do apart from calling people and writing text messages.

This, however, does not make users absolutely untraceable. A mobile phone can be located by other means such as GSM localization. However, the locating done by Google and Apple will be disabled, as will the localization of users by any and all apps. Because when the GPS is turned off by the phone it doesn’t transmit the location data at all. So while – in case of doubt – still traceable, users will be less of a willing accessory to big data collection and the marketing of their personal knowledge.

About the Author

Dominik Bärlocher

Dominik Bärlocher has been working with IT subjects since 2006. The journalist relied on his affinity for all things IT during his tenures at news papers and benefited from it. At scip, he conducts OSINT researches and is an expert at information gathering.

Links

You need support in such a project?

Our experts will get in contact with you!

×
OTPs as Second Factor

OTPs as Second Factor

Mark Zeman

JWT Issues

JWT Issues

Andrea Hauser

CIS Controls

CIS Controls

Tomaso Vasella

Ransomware Detection, Defense, and Analysis

Ransomware Detection, Defense, and Analysis

Marc Ruef

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here