PII And Google Analytics Quick View
Personally identifiable information (PII) in Google Analytics
The Journey with “Digital Analytics Minidegree” on CXL Institute is Continue and this is the 7th Post.
the instructor of the course is Fred Pike is really good. Thanks, Mr. Pike.
What is Personally Identifiable Information?
A number of things is constituted as PII. Personally Identifiable Information includes a user’s name, their social welfare number, email address, data identifying a selected device (like a mobile phone’s serial number), or any similar data during this vein.
It’s important to note that the U.S. General Services Administration doesn’t restrict the definition of “PII” to any specific category of information or technology. within the GSA’s Privacy Act, they see this information as:
“[Anything] which can be accustomed distinguish or trace a person’s identity, either alone or when combined with other personal or identifying information that’s linked or linkable to a selected individual.… In performing this assessment, an office needs to acknowledge that non-PII can become PII whenever additional information is created publicly available — in any medium and from any source — that, when combined with other available information, will be accustomed to identify a person.”
Yes, you can gather much of this data from users, but they have to enter it willingly and so the knowledge can’t be encoded into any transmittable data.
PII entered by users
Website visitors and users sometimes enter PII into search boxes and form fields. ensure to urge eliminate PII from user-entered information before it’s sent to Analytics.
Hashed and salted PII
You can send Google Analytics an encrypted identifier or custom dimension that’s supported PII, as long as you utilize the proper encryption level. Google features a minimum hashing requirement of SHA256 and strongly recommends the use of salt, with a minimum of 8 characters. Notwithstanding any of the foregoing, you may not send Google Analytics encrypted Protected Health Information (as defined under HIPAA), whether or not it’s hashed or salted.
HIPAA disclaimer
Unless otherwise per writing by Google, Google doesn’t intend uses of Google Analytics to create obligations under the insurance Portability and Accountability Act, as amended, (“HIPAA”), and makes no representations that Google Analytics satisfies HIPAA requirements. If you’re (or become) a Covered Entity or Business Associate under HIPAA, you’ll not use Google Analytics for any purpose or in any manner involving Protected Health Information unless you’ve received prior written consent to such use from Google.
How do i do know if my website is collecting PII?
The simplest because of the check for PII is in query parameters in your Google Analytics. Navigate to Behavior » Site content » All pages and do a pursuit on “ @ “ to work out the active query parameters within the account.
To check for PII as event dimensions, navigate to the Events report in GA and check all Event categories, actions, and labels to create sure it is not being stored.
To check for PII in custom dimensions, visit Admin » Custom Definitions » Custom Dimensions. Create a custom report that pulls in custom dimensions, and from there ensure none of the scales contain PII.
To check for PII in campaign parameters, scan the source, medium, campaign, and content for campaign-tagged traffic. If you see a utm tagged campaign, confirm to triple make sure the parameters won’t commence PII.
To check for PII in signup forms, make sure that form submits implement a “POST” request, not a “GET” request. Your developer can change this if you’re implementing a “GET” request.
What do I do if I find PII?
Act immediately. If your website is collecting PII, this is often a top priority to handle. Work along with your developer to forestall collecting PII through the net site. From there, strip the query parameters in Google Tag Manager to induce eliminate PII thereon end.
Next, keep a replica of your account data with an export. From there, create a replacement view by copying the prevailing view to create sure it’s PII-free going forward.
Email Addresses
Email addresses are far and away the foremost common reasonably PII we discover in GA data because they’re so frequently passed as query parameters. so on draw any email addresses in your data put the following within the view filter:
@
Yep — that’s it! Nothing fancy here. If a page path contains a “@”, it’s likely an email address and can not be in your data! If you want to be extra careful, you’ll also try filtering the table for an encoded @ as “%40”.
First Names
Names are often passed into GA via query parameters still. We’re presumably to identify PII involving names by searching for variety of the foremost common first names:
(j(im(my)?|ohn|ames)|robert|bob(by)?|michael|dav(id|e)|(d|r)ic(k|hard)|ch(arl(es|ie)|uck)|mary|pat(ty|ricia)|linda|barb(ara)?|e?liz(zy|abeth)|jenn?(ifer)?|mari(e|a)|su(e|san)|sarah?)
This regex focuses on common American first names; be happy to edit so on incorporate names which will be more common for your site’s traffic. And remember, this filter shouldn’t be designed to tug out every instance of user names, it should just include enough common names to figure out whether or not you’re including this type of PII in your GA data.
First names aren’t generally considered to be PII on their own because you likely can’t identify a particular individual who visits your site with just their forename. But if a primary name is being captured, a final name is probably somewhere in your data likewise which combination is certainly considered PII.
There are other combinations of otherwise non-PII data that, together, are also accustomed to identify a particular user, so keep this in mind as you decide on whether or to not exclude data like this from your GA data.
Phone Numbers
Phone numbers are captured less often than email addresses and names, but they still appear infrequently. to seem for phone numbers that have a varying number of digits, delimiters, and other formats, use the following regex:
(((\+?1(\.|-|\s*)?)?\s*)((\d(\.|-|\s*)?)?\s*)(\d(\.|-|\s*)?\s*)(\d\s*((x|ext\.?(ension)?)\s*\d*)?))
Physical Addresses
It is difficult to search} out physical street addresses in your data because you’d prefer to go looking for abbreviations and short, common words, which could pull in an exceedingly very lot of unintended, unrelated results. Start with this filter and alter it pro re nata to verify that you simply just don’t seem to be collecting this kind of PII:
(street|st|road|rd|drive|dr|lane|ln|avenue|ave|boulevard|blvd|highway|hwy|township|twp|north|south|east|west)
Credit Card Numbers
Fortunately, I’ve got never seen MasterCard information being passed into GA, however, I’ve got heard of it happening. you certainly don’t need to send this information to GA under any circumstances, that the regex for this PII is broadly defined:
\d(-|\s*)?\d(-|\s*)?\d(-|\s*)?(\d)?
Thanks, CXL Institute.
Let’s Dive in the Resources:
How To Find Personally Identifiable Data In Your Google Analytics.
The everything guide to PII: What does it know? Does it know things? Let’s find out.
Best practices to avoid sending Personally Identifiable Information (PII).