Google Analytics and corporate directory lookups

Google Analytics is a powerful tool for getting information about web site users and their behavior. But there are some issues raised when it’s used to track activity on a corporate directory application.

The main concern is privacy. Every company has (or should have) a privacy policy that governs the publication of Personally Identifiable Information (PII) about users, employees, and non-employees on their web sites — and the sharing of same with third parties. Google’s policy when it comes to Analytics is simple and straightforward: passing PII to them is prohibited. Here is the relevant text [1]:

You will not (and will not allow any third party to) use the Service to track, collect or upload any data that personally identifies an individual (such as a name, email address or billing information), or other data which can be reasonably linked to such information by Google. You will have and abide by an appropriate Privacy Policy and will comply with all applicable laws, policies, and regulations relating to the collection of information from Visitors. You must post a Privacy Policy and that Privacy Policy must provide notice of Your use of cookies that are used to collect data. You must disclose the use of Google Analytics, and how it collects and processes data. This can be done by displaying a prominent link to the site “How Google uses data when you use our partners’ sites or apps”, (located at http://www.google.com/policies/privacy/partners/, or any other URL Google may provide from time to time). You must not circumvent any privacy features (e.g., an opt-out) that are part of the Service.

What that means is that data like names and e-mail addresses cannot be included in data submitted to Google for analysis. Private street address and phone numbers are also forbidden. Office street addresses and direct line phone numbers, if they could be linked to users on the public Internet also need to be avoided [2].

The question of whether userids or usernames, system identifiers used in authentication and internal company identification can be revealed is complicated. Internal private usernames, those not self-selected by users but generated by a system, are not considered to be PII for the purposes of Google Analytics. Public usernames, those selected by the user, or publicly linked to the user on external systems would be considered PII [3].

So if I choose systemninja as my username and my real name appears alongside it on a public web page, then that username is PII. But if I am assigned a userid like E0011249 for authentication to and identification by my company’s internal systems and that number is not linked to me on any external site (in an Internet-accessible corporate directory, for example), then it is not PII.

Fortunately, complying with Google’s policy will not be difficult for most users. If your application displays any of the prohibited information as parameters in an application uri. For example:

http://directory.example.com/search?givenname=bill&sn=gates&uid=&mail=

You can exclude these parameters by listing them in the View Settings for the particular site on the Admin tab. To do this just navigate to Admin and choose the target site under Property, then go to View… View Settings and insert a comma separated list of the parameter labels in the Exclude URL Query Parameters window.

If this kind of information is embedded in a url without parameter labels you may have to use a filter to hide those urls. The Filters interface is also found under View for each site. Crafting filters can require research and careful planning. You’ll need to study the url patterns produced by your application and use the correct syntax to exclude the offending material.

A simple example would be a filter that excludes urls that contain an e-mail address. If these urls looked something like:

http://directory.example.com/profiles/phillembo@example.com

the filter to choose would be a Custom… Exclude… Request URI with a Filter Pattern like:

/profiles/.+@\.example\.com

Note that by excluding a url you’ll be removing it from the data submitted to Google, resulting in any hits on that url (or urls that match the pattern you specify) not being counted in Google’s analysis [4].

References:

[1] Terms of Service, section 7.

[2] Google Analytics & Personally Identifiable Information (PII).

[3] See Identifying your users in Google Analytics, and Are ‘usernames’ Privately-Identifiable Information (PII)?.

[4] Some good resources for crafting filters:

Google Analytics Help: About regular expressions

The Ultimate Guide to Google Analytics Profile Filters

Regular Expressions Guide for Google Analytics

This entry was posted in System Administration, Systems Analysis, Web on by .

About phil

My name is Phil Lembo. In my day job I’m an enterprise IT architect for a leading distribution and services company. The rest of my time I try to maintain a semi-normal family life in the suburbs of Raleigh, NC. E-mail me at philipATlembobrothersDOTcom. The opinions expressed here are entirely my own and not those of my employers, past, present or future (except where I quote others, who will need to accept responsibility for their own rants).