Texas Tech University

Anonymous v. Identifiable Data

Researchers collect and receive data in various forms. Here are the key terms to think about when collecting, analyzing, and storing data to protect the participants privacy and confidentiality.

  • Anonymous Data: Data was collected without identifiers and was never linked to an individual.
  • Individual identifiers that might be collected in the course of data collection:

    Individual identifiers that might be collected in the course of data collection:

    Identifiable Data: Any data that includes personal identifiers.

    • Names
    • All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census: (a) The geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people; (b)The initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
    • All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
    • Telephone numbers
    • Fax numbers
    • Electronic mail addresses
    • Social security numbers
    • Student ID numbers
    • Medical record numbers
    • Health plan beneficiary numbers
    • Account numbers
    • Certificate/license numbers
    • Vehicle identifiers and serial numbers, including license plate numbers
    • Device identifiers and serial numbers
    • Web Universal Resource Locators (URLs)
    • Internet Protocol (IP) address numbers
    • Biometric identifiers, including finger and voice prints
    • Full face photographic images and any comparable images
    • Any other unique identifying number, characteristic, or code; and the covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information
  • Coded Data: Personal identifiers have been removed from the data and replaced with a participant code. The personal identifier(s) and participant code are in a separate document away from the data. This allows the researcher to trace the data back to the original source. If a link exists, data are considered indirectly identifiable and not anonymous, anonymized or de-identified.
  • Indirectly Identifiable Data: Data does not include personal identifiers, but link identifiers to the data with a code. This data is still considered identifiable
  • De-Identified Data: All personal identifiers are permanently removed from data. No code or key exists to link the individual back to the data.
  • Confidentiality is when the researcher knows the identity of a research participant/entity but takes steps to protect the identity from being discovered by others.
    • Deductive Disclosure
      • Occurs when the traits of individuals or groups make them identifiable in research reports
    • Report as Aggregate Data
      • Tell a groups' story not a specific person's story
    • Use Pseudonyms
    • Avoid specific locations/places
    • Avoid direct quotes