← Back to All Frameworks

Australia Health Data De-identification Framework

Privacy Act 1988 and Australian Privacy Principles

Overview

Australia has developed a comprehensive approach to health data de-identification that combines national legislation, sector-specific guidelines, and technical standards. The Australian framework places strong emphasis on risk-based assessment and recognizes the contextual nature of de-identification.

The Australian approach acknowledges that de-identification is not a binary concept but exists on a spectrum of risk. This risk-based approach is central to Australia's regulatory framework, which focuses on whether information is "reasonably identifiable" in a given context rather than prescribing specific technical methods.

"De-identification of personal information is not a fixed or single process but depends on context. It involves removing or altering information that identifies an individual or is reasonably likely to do so."
- Office of the Australian Information Commissioner (OAIC)

Key Regulatory Bodies:

  • Office of the Australian Information Commissioner (OAIC) - Australia's independent national privacy regulator, responsible for privacy functions established by the Privacy Act 1988
  • Australian Digital Health Agency - Leads Australia's digital health strategy and operates the My Health Record system
  • Australian Institute of Health and Welfare (AIHW) - Australia's national agency for health and welfare information and statistics
  • National Data Commissioner - Oversees the Data Availability and Transparency Act (DATA) Scheme established in 2022 to enable controlled access to public sector data
  • Therapeutic Goods Administration (TGA) - Regulates medical devices and pharmaceuticals, including health data used in clinical trials

Legal Framework

Australia's health data de-identification framework is built upon several key pieces of legislation and regulation:

Primary Legislation

Example: Privacy Act 2023-2024 Amendments

The Privacy Legislation Amendment (Enforcement and Other Measures) Act 2022 introduced significant changes that came into effect in 2023-2024, including:

  • Increased maximum penalties for serious or repeated privacy breaches to $50 million, three times the value of any benefit obtained through the misuse of information, or 30% of a company's adjusted turnover in the relevant period
  • Enhanced powers for the OAIC to resolve privacy breaches
  • Strengthened notification requirements for data breaches
  • New information sharing powers to assist the OAIC in investigations
  • Extraterritorial application to organizations doing business in Australia, even if not physically present

These amendments significantly increase the potential consequences for organizations that fail to properly de-identify health data.

State and Territory Legislation

Each Australian state and territory has its own health privacy legislation, creating a complex regulatory landscape:

Example: Jurisdictional Complexity

A healthcare provider operating across multiple Australian states must comply with both the federal Privacy Act and the relevant state/territory legislation in each jurisdiction where they operate. For instance, a telehealth service operating in both NSW and Victoria would need to comply with:

  • Federal Privacy Act 1988 and APPs
  • NSW Health Records and Information Privacy Act
  • Victorian Health Records Act

This creates a complex compliance environment where de-identification practices may need to satisfy multiple regulatory frameworks.

Case Study: My Health Record Secondary Use Framework

The Framework to Guide the Secondary Use of My Health Record System Data, released in 2018 and updated in 2023, provides a comprehensive approach to de-identification of Australia's national electronic health record data:

  • Establishes a dedicated Secondary Use of Data Governance Board
  • Requires data to be de-identified before release except in specific circumstances
  • Prohibits certain uses including commercial and non-health-related purposes
  • Implements a multi-layered approval process for data access
  • Requires ethics committee approval for research projects
  • Establishes a secure data access environment
  • Mandates public benefit testing for all data uses
  • Allows individuals to opt out of having their data used for secondary purposes

This framework demonstrates Australia's comprehensive approach to balancing privacy protection with enabling beneficial uses of health data.

Key Concepts and Definitions

Australian privacy law includes several important concepts related to de-identification:

Concept Definition
De-identification The process of removing or altering information that identifies an individual or is reasonably likely to identify an individual. This involves both removing direct identifiers and addressing indirect identification risks through technical and administrative controls.
Personal information Information or an opinion about an identified individual, or an individual who is reasonably identifiable, whether the information or opinion is true or not, and whether recorded in material form or not.
Sensitive information A subset of personal information that includes health information, genetic information, biometric information, and other categories that receive additional protections under the Privacy Act.
Re-identification The process of turning de-identified data back into identifiable data, either by restoring removed identifiers or by using other available information to infer identity.
Reasonably identifiable A key threshold concept that depends on context, including the nature and amount of information, who will have access to it, and other information that may be available.
Disclosure risk The likelihood that an individual could be re-identified from supposedly de-identified data, considering all relevant factors including other available information.
Data custodian The entity responsible for managing and protecting data, including implementing appropriate de-identification measures.
Accredited Data Service Provider Under the DATA Scheme, an organization accredited to provide data services including de-identification.

Example: "Reasonably Identifiable" in Practice

The OAIC provides the following example: A dataset contains patient health records with names and Medicare numbers removed, but includes full date of birth, gender, and detailed postcode information. While no direct identifiers remain, the combination of these quasi-identifiers could make individuals "reasonably identifiable" in small geographic areas where few people share the same characteristics. The OAIC would likely consider this information to still be personal information subject to the Privacy Act.

In a 2017 case, the Department of Health published de-identified health data that was subsequently re-identified by researchers who linked it with other publicly available information. This led to significant reforms in Australia's approach to data release and de-identification practices.

Under Australian law, properly de-identified information is no longer considered personal information and thus falls outside the scope of the Privacy Act. However, the test is contextual and depends on the reasonable likelihood of re-identification in the circumstances.

Office of the Australian Information Commissioner (OAIC) Guidelines

The OAIC has published extensive guidance on de-identification, including:

De-identification and the Privacy Act

This guidance outlines:

Example: Risk-Based Approach to De-identification

The OAIC recommends a risk-based approach that considers:

  • Data environment factors: Who can access the data, what other data is available, what controls exist
  • Data factors: What information remains in the dataset, how unique or distinguishable records are
  • Intent factors: The motivation, skills, and resources of potential attackers

For example, releasing hospital admission data might require different de-identification approaches depending on whether it's being:

  • Published openly on the internet (highest risk)
  • Shared with approved researchers in a secure environment (moderate risk)
  • Used internally for quality improvement (lower risk)

The De-identification Decision-Making Framework

Developed in partnership with CSIRO's Data61, this comprehensive framework provides a structured approach to de-identification that includes:

  1. Establish context: Define the data situation and evaluate risks
  2. Understand the data: Identify variables and assess disclosure risks
  3. Choose de-identification methods: Select appropriate techniques based on risk assessment
  4. Calculate re-identification risk: Quantify the likelihood of re-identification
  5. Manage risk: Implement controls and governance frameworks

Case Study: Australian Census Data Release

The Australian Bureau of Statistics (ABS) applies sophisticated de-identification techniques to census data that contains sensitive health information:

  • Implements a multi-layered approach to protect individual privacy while maintaining data utility
  • Uses perturbation techniques to introduce small random adjustments to data
  • Applies different levels of geographic aggregation depending on sensitivity
  • Suppresses small counts for sensitive health conditions
  • Provides different access mechanisms based on user needs and trustworthiness
  • Conducts comprehensive disclosure risk assessments before each data release
  • Implements TableBuilder, a tool that applies automatic confidentiality protections

This approach has enabled the release of valuable population health data while protecting individual privacy.

Technical Approaches

Australian guidance recommends various technical approaches to de-identification:

1. Direct Identifier Removal

Removal of information that directly identifies individuals, such as:

Example: Direct Identifier Removal in Practice

The Australian Institute of Health and Welfare (AIHW) applies the following techniques when releasing health datasets:

  • Removal: Complete elimination of direct identifiers
  • Pseudonymization: Replacing identifiers with randomly generated codes that maintain the ability to link records while removing identifying information
  • Key separation: Storing linking keys separately from content data with strict access controls
  • Secure hash functions: Creating non-reversible identifiers for linkage purposes
  • Statistical linkage keys: Using standardized approaches to create linkage keys that don't reveal identity

For example, in the National Hospital Morbidity Database, patient names and Medicare numbers are replaced with randomly generated identifiers before data is provided to researchers.

2. Statistical Techniques

Various methods to address indirect identification risks:

Technique Description Example Application
Aggregation Combining values into categories (e.g., age ranges instead of specific ages) Converting exact ages to 5-year age bands (e.g., 30-34, 35-39)
Suppression Removing variables or records that present high re-identification risk Removing rare disease codes that affect very few individuals in a dataset
Perturbation Adding noise to data while preserving statistical properties Adding random variations to laboratory values while maintaining overall distribution
Synthetic data Creating artificial data that maintains statistical properties Generating synthetic patient records that reflect real population characteristics
k-anonymity Ensuring each combination of attributes occurs in at least k records Ensuring at least 5 people share each combination of age, gender, and postcode
l-diversity Ensuring sensitive attributes have diverse values within each equivalence class Ensuring multiple different diagnosis codes exist for each demographic group
Differential privacy Adding statistical noise in a way that provides mathematical privacy guarantees Adding calibrated noise to aggregate statistics about health conditions
Cell suppression Hiding cells in tabular data that could reveal individuals Suppressing counts less than 5 in health statistics tables
Data swapping Exchanging values between records to break identifiability Swapping certain demographic details between similar records
Microaggregation Replacing individual values with averages from small groups Replacing individual blood pressure readings with small group averages

Example: Differential Privacy Implementation

The Australian Bureau of Statistics has begun implementing differential privacy techniques for certain data releases:

  • Establishes a privacy budget (epsilon) that quantifies the privacy risk
  • Adds carefully calibrated noise to statistics based on sensitivity
  • Provides mathematical guarantees against re-identification
  • Balances privacy protection with data utility
  • Allows transparent communication about privacy protection levels

This approach represents the cutting edge of privacy-preserving data release in Australia.

The Five Safes Framework

Australia has widely adopted the "Five Safes" framework, originally developed in the UK, as a structured approach to managing sensitive data. This framework is now embedded in Australia's data sharing practices, including the Data Availability and Transparency Act 2022:

Safe Dimension Description Australian Implementation Example
Safe People Ensuring data users are authorized, trained, and trustworthy AIHW requires researchers to sign confidentiality undertakings and complete training before accessing sensitive health data
Safe Projects Ensuring data use is appropriate and ethical Human Research Ethics Committee approval required for health data research projects
Safe Settings Controlling the environment in which data is accessed The Secure Unified Research Environment (SURE) provides a secure virtual environment for analyzing sensitive health data
Safe Data Applying technical controls to remove identifiers Application of statistical disclosure control methods to health datasets before release
Safe Outputs Ensuring results of analysis don't disclose sensitive information Statistical output checking before publication of research findings based on sensitive data

Example: The Five Safes in the Australian DATA Scheme

The Data Availability and Transparency Act 2022 explicitly incorporates the Five Safes framework into Australia's data sharing legislation. For health data sharing under this scheme:

  • Safe People: Data recipients must be accredited by the National Data Commissioner
  • Safe Projects: Data sharing must be for an authorized purpose (delivering government services, informing policy, or research and development)
  • Safe Settings: Appropriate security controls must be in place
  • Safe Data: Only the minimum necessary data can be shared
  • Safe Outputs: Results must be checked before wider release

Case Study: The Secure Unified Research Environment (SURE)

SURE is a secure computing environment developed by the Sax Institute to enable safe access to sensitive health data:

  • Provides a virtual desktop infrastructure for researchers to access and analyze sensitive health data
  • Implements multiple security layers including two-factor authentication
  • Prevents data downloads or transfers outside the secure environment
  • Records all user actions for audit purposes
  • Requires all outputs to be checked for disclosure risk before release
  • Enables collaboration across institutions while maintaining data security
  • Has supported over 800 research projects using sensitive health data

SURE exemplifies the "Safe Settings" component of the Five Safes framework and has become a model for secure data access internationally.

This approach recognizes that de-identification is not solely about technical treatments of data but includes the entire context of data use.

Sector-Specific Guidance

Several Australian organizations have developed sector-specific guidance for health data de-identification:

Australian Institute of Health and Welfare (AIHW)

The AIHW's Confidentiality Guidelines provide specific approaches for de-identifying health and welfare data, including:

Example: AIHW Small Numbers Protocol

The AIHW applies specific rules when reporting health statistics for small geographic areas:

  • Cells with counts of 1-4 are generally suppressed
  • Additional cells may be suppressed to prevent calculation of suppressed values (complementary suppression)
  • Rates based on small numbers are flagged as potentially unreliable
  • Indigenous health data has additional protections due to its sensitivity
  • Geographic areas may be combined to increase population size
  • Confidence intervals are provided to indicate statistical reliability

Department of Health and Aged Care

Provides specific guidance for de-identifying Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) data, including:

Australian Digital Health Agency

Provides guidance specific to My Health Record data de-identification, including:

Case Study: Population Health Research Network (PHRN)

The PHRN is a national data linkage infrastructure that enables privacy-preserving linkage of health datasets across Australia:

  • Implements a "separation principle" where identifying information is separated from content data
  • Uses specialized data linkage units in each state/territory
  • Applies privacy-preserving record linkage techniques
  • Creates linkage keys without revealing identities
  • Enables researchers to access linked data without seeing identifiers
  • Supports complex multi-jurisdictional data linkage projects
  • Has facilitated over 700 research projects using linked health data

This infrastructure has enabled valuable population health research while maintaining strong privacy protections.

How It Compares to HIPAA Safe Harbor

Australia's approach differs from HIPAA Safe Harbor in several key ways:

Example: De-identifying a Patient Dataset

Under HIPAA Safe Harbor: Remove the 18 specified identifiers (e.g., names, all geographic subdivisions smaller than a state, all dates directly related to an individual, phone numbers, etc.)

Under Australian Framework: Conduct a context-specific risk assessment that considers:

  • Who will have access to the data
  • What other information they might have
  • How the data will be used and protected
  • The specific re-identification risks in the dataset
  • Apply appropriate technical and administrative controls based on this assessment
  • Implement governance frameworks for ongoing management
  • Consider the full data environment including access controls
Feature HIPAA Safe Harbor Australian Framework
Approach Prescriptive list of 18 identifiers to remove Principles-based assessment of "reasonably identifiable"
Legal certainty High - clear compliance pathway Moderate - requires judgment and risk assessment
Flexibility Low - same approach for all contexts High - tailored to specific use cases and contexts
Environmental controls Limited focus - primarily on data transformation Comprehensive - incorporates Five Safes framework
Risk assessment Limited - focused on "actual knowledge" test Extensive - central to compliance approach
Documentation Minimal requirements Comprehensive documentation expected