Risk Topics

safe harbor de-identification of health data


The health industry works with a standard called the "Safe Harbor" for de-identifying personal information. It's supposed to reduce the number of unique records to 0.04% of the population, meaning only about 1 in 2,500 people can be uniquely identified with the data once it's been restricted/altered. It's part of HIPAA:

The Safe Harbor method for de-identification is defined as follows: (2)(i) The following identifiers of the individual or of relatives, employers, or household members of the individual, are removed: (A) Names (B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census: (1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and (2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000. (C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older. (D) Telephone numbers (L) Vehicle identifiers and serial numbers, including license plate numbers (E) Fax numbers (M) Device identifiers and serial numbers (F) Email addresses (N) Web Universal Resource Locators (URLs) (G) Social security numbers (O) Internet Protocol (IP) addresses (H) Medical record numbers (P) Biometric identifiers, including finger and voice prints (I) Health plan beneficiary numbers (Q) Full-face photographs and any comparable images (J) Account numbers (R) Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of this section; and (K) Certificate/license numbers

(ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

I find it odd that the financial industry doesn't push something similar to this, which has been used in the health sphere for years. Or if the finance field has done so, how I could have operated in that area so long without finding similar guidance. Nothing like this is in common practice, no matter the existence of such a standard: I've seen banks throw any and all of these fields at third parties with the slightest provocation. I think they need to learn from the health industry.

© 2013 - 2019 werneburg information risk management inc.