(by Will Greenberg | Electronic Frontier Foundation) – In Part 1 of our series on Fog Data Science, we saw how when you give some apps permission to view your location, it can end up being packaged and sold to numerous other companies. Fog Data Science is one of those companies, and it has created a sleek search engine called Fog Reveal that allows cops to browse through that location data as if they were Google Maps results.
In this article, we’ll be taking a deep dive into Fog Reveal’s features. Although accounts for Reveal are typically only available to police departments, we were able to analyze the app’s public-facing code to get a better understanding of how it works, how it’s used, and what it looks like when cops get warrantless access to your location data.
What We Found
Fog Reveal offers law enforcement a powerful and incredibly invasive tool for sifting through huge datasets of phone location data. Reveal’s workflow allows cops to perform “geofenced” device searches, i.e. a search for all devices in a specified region on a map, and then find all other locations those devices were at other times. A powerpoint presentationwe received from the Chino Police Department describes how cops use these features to identify so-called “bed-down” locations and build up “patterns of life” for device’s owners. These features clearly undercut Fog’s claim that their product only contains “anonymized” data with “no PII [personally identifiable information]”.
We also discovered that Reveal’s frontend code contains the traces of a much more powerful “federal” featureset, which would allow users to further deanonymize data by revealing device Advertiser IDs, IP addresses, and other phone details. As we will discuss, we do not know if these features are currently in use, but regardless, they demonstrate how simply showing a few more data fields can make a data aggregation tool like this much more invasive.
By saving Reveal’s frontend files and organizing them into directories mirroring their original URL paths, we made a local reproduction of the site’s resources. From there, we wrote a mock backend server to serve the files and handle API calls made by the frontend, and then systematically worked out the format of data expected from that API. Once this was done, we had a semi-functional local reproduction of Reveal that made no requests to Fog’s actual server, and yet allowed us to explore its features.
Because our mock server isn’t an exact replica of Reveal’s actual backend, we should preface this article by saying that our findings here only apply to the frontend code, as our mock server’s functionality is based on educated guesses and only returns fake location data. Consequently, it’s possible that our local reproduction’s behavior differs from Fog’s actual application. Where appropriate, we will cite the relevant frontend code (which we’ve made available on DocumentCloud) and point out where uncertainties remain, and in general will describe our estimation of Fog Reveal’s actual features with as few assumptions about the backend as possible.
With that out of the way, let’s now take a look at our findings on Fog Reveal’s features. All of the data depicted in the following document, including latitude/longitude coordinates and IP addresses, are fake data generated randomly by our mock backend server. All screenshots are of our reconstructed app, not of Fog’s production app.
Making a query
After signing into Reveal, the user is presented with a Google Maps view of the US, as well as a toolbox at the top-right of the screen:
Reveal’s frontend shows several tools for drawing geofences, the most basic of which is just a circle:
If this isn’t specific enough, users can also draw arbitrary shapes to carve out a more detailed geofence:
The frontend limits the size of these geofence queries, although those limits are quite large. For example, the frontend circle tool will allow queries with a radius of 2500 meters1, allowing up to nearly 20 square kilometers when performing a “signal search.” It’s possible that the backend imposes further limitations.
The user can also specify a date and time range for their query, and it seems that these ranges can stretch back over several months: a copy of Fog Reveal’s user manual received from Greensboro Police Department claims that date/time ranges can extend up to 90 days, and can be searched “back to Jun[e] of 2017”.
After specifying a geofence and date/time range, the user can run their query. Queries return a set of data points, referred to as “signals” in the user manual, which represent where a device was at a given point in time2. The user can then do further analysis on these signals, such as grouping them by the device that produced them, or displaying the path taken by the device over time:
As an aside, in this example we’ve been using the EFF office in San Francisco, which coincidentally was the location of a Planned Parenthood clinic in the past. While we do not have evidence that Fog or its law enforcement customers are using Reveal to search for people who’ve sought reproductive healthcare, it’s nevertheless conceivable that it could be used in this way: we have examples of cops using Reveal to search individual buildings, as well as examples of other data brokers selling the location data of Planned Parenthood patients (though SafeGraph stopped this practice after the story broke). After the Supreme Court’s decision to overturn Roe v. Wade, and as states across the country pass increasingly draconian bills restricting people’s access to abortion, it’s important to consider that Reveal and tools like it represent a new threat to people seeking reproductive healthcare.
Digging deeper with device queries
The frontend code suggests that Fog creates unique internal identifiers for devices–called “Fog IDs” (or “registration IDs,”3 which we understand to be the same as Fog’s “device registration number”). These unique identifiers can be queried directly, allowing users to get all signals produced by devices within a certain period of time, regardless of whether they were in the original geofence or not:
In the user manual, this feature is called a “device query” and is described as including data from the device’s “local, regional or global travel.” The user manual also describes a feature called “common device queries”, which allow the user to determine “if any devices are common to multiple locations.”
If certain user parameters are set4, Reveal will update its logo to display “Reveal Federal”, and enables the frontend to request a much more powerful suite of query tools from the backend. The frontend code suggests that these conditions may occur if the user is a member of federal law enforcement5, but because we have no public records mentioning any such federal users, we don’t know for sure which users (if any) this is true for. For the purposes of this document, we will refer to these hypothetical users as federal users.
Federal users have access to an interface for converting between Fog’s internal device IDs (“FOG IDs”) and the device’s actual Advertiser ID6:
This is eyebrow raising for a couple reasons. First, if this feature is operational, it would contradict assurances made in a sample State search warrant Fog sends to customers that FOG IDs can’t be converted back into Advertiser IDs. Second, if users could retrieve the Advertiser IDs of all devices in a query’s results, it would make Reveal far more capable of unmasking the identities of those device’s owners. This is due to the fact that if you have access to a device, you can read its Advertiser ID, and thus law enforcement would be able to verify if a specific person’s device was part of a query’s results.
Additionally, when a federal user views the devices in their results, the frontend is designed to show them a great deal more information7 about each device than it does non-federal users. Assuming that the backend provides this data, a federal user could view device information such as:
- User Agent
- Browser Family
- Browser Version
- OS Family
- OS Version
- Device Family
- Device Brand
- Device Model
- Whether the device belongs to an EU Resident
- Last Seen IP Addresses
Federal users are also given an interface to query for signals/devices based on one or more IP addresses:
Connections to Venntel
Many of the features we analyzed in this article are powered by API calls that reference Venntel, a major player in the data broker scene and DHS contractor. Although it’s true that Fog’s engineers could have named these API endpoints arbitrarily, the way they function does seem to suggest that Venntel is a source of location and device data for Reveal.
Notably, when a Reveal user performs any geofenced device query, that query is submitted to the URL path
/Venntel/GetLocationData. Additionally, queries for specific device locations send a request to
/Venntel/GetDeviceLocationData, and when a federal user makes a request for more device details, the frontend sends a request to
/Venntel/GetDeviceDetails. This means that nearly all frontend requests having to do with searching device or location data are prefixed with “Venntel”. And this wouldn’t be the only connection between Fog and Venntel: many of the records EFF has received point to a close link between the two companies.
As we’ve seen, Fog Reveal provides law enforcement a powerfully invasive tool for searching huge swaths of commercially available location data. With a few clicks, its users can find not only the devices present in a location, but also everywhere else each of those devices went during other time periods. Its federal featureset, whether currently in use or not, demonstrates how much more invasive the tool could be by only revealing a handful of other fields.If you’re not happy about the idea of your location data possibly being sold to companies like Fog, we don’t blame you. Luckily, there’s an easy step you can take to make it much harder for data brokers and companies like Fog to tie your location data to your device: disabling Ad ID tracking on your phone. Beyond that, we believe that there are changes needed at both the technical and legal levels to prevent this kind of invasive data collection and usage. To learn more, check out our other articles in this series on data brokers.