2. ESTADO DEL ARTE
2.9. PROYECTOS DESARROLLADOS EN DETECCIÓN DE MAP Y AEI
2.9.1. Desarrollos internacionales
The physical and virtual security of the information in the care of a data custodian before, during and after a linkage process is also a major issue in relation to the privacy protections offered to community sector clients.
The protocol identified in Section 7.3 provides some guidelines on the safe storage and handling of information during a statistical linkage process. Other protections identified by the SLKWG include conducting the linkage on a non- networked (stand-alone) computer. The guidelines and protections around the handling of data during the linkage process are also governed by relevant agency-specific proto- cols and privacy protection laws.
6.5.3.1 Encryption algorithms
There are specific scrambling techniques which can be used to encode the SLK attached to the de-identified and aggregated data to protect its confidentiality during transmission between agencies. Encryption algorithms are well developed and have been widely used to preserve the confidentiality of data transmission, especially in the financial sector to preserve the confidential transmission of financial details and information.
The options for using encryption to preserve the confidentiality of data during transmission will be discussed here in terms of the SLK only, rather than encrypting the entire data stream. The SLKWG has recommended that, for the best possible linkage using probabilistic methods, the SLK would contain full demographic data and be attached to suitably aggregated, service experience information. As such, it should be sufficient to encrypt the SLK only, with the remaining data stream not encrypted.
Where agencies consider that it may be necessary to encrypt both the SLK and the attached, de-identified and aggregated data stream, it would be necessary to use a reversible encryption algorithm, for reasons discussed in detail below. The second option outlined below (that is, a non-reversible encryption algorithm) would not be suitable.
As indicated, encryption algorithms usually fall into one of two main categories: reversible and non-reversible. Reversible encryption algorithms are shared between the contributing agency and the receiving agency. The reversible algorithm depends on an agreed decryption key being used by the receiving agency to decrypt the data received. This decryption key may or may not be known by the contributing agency, and must at all times be kept secret and secure by the receiving agency. Leaking of the decryption key used to deconstruct the algorithm would obviously mean that the confi- dentiality of the encryption process would be compromised, with the data in danger of being decrypted by unauthorised individuals. On receipt of the data from the contributing agency, the receiving agency would use the decryption key to ‘unlock’ the SLK (composed of full demographic data) and then use the full demographic data to link records across collections using probabilistic linkage methods.
The second option would involve a non-reversible encryption algorithm being held by the contributing agency only. The contributing agency would encrypt the SLK using an algorithm which could not be decrypted, add the encrypted SLK to the de- identified and aggregated data and provide this to the receiving agency. On receipt of the data, the receiving agency would only be able to link the records received using deter- ministic (character to character matching) linkage methodology. Use of the non- reversible encryption algorithm by the contributing agency would necessarily be limited only to the SLK, with the remainder of the data stream (that is, the de-identified and aggregated service experience data) never being able to be encrypted.
Non-reversible encryption algorithms permit agencies to freely exchange the encryption algorithm and encrypted SLK without risk of identifying personal client details.
The Department of Health Services in South Australia has recently inves- tigated (with the ABS) the use of a non-reversible encryption algorithm for use with Gambling Rehabilitation Fund data (collected by the Break Even services for gambling counselling). The encryption algorithm works by summing the ASCII representation of names, gender and date of birth, dividing by 256 and converting the remainder to hexa- decimal representation. The algorithm provides 2 to the power 36 unique codes, and has an error rate of a fraction of a per cent, and is therefore nearly unique. It is irreversible since anyone intercepting the data cannot tell how many lots of ‘256’ were in the original sum, or even how many letters there were in the first or last names entered.
The choice of which form of encryption to use will need to be made by each relevant agency (or steering committee) participating in the linkage project. In some cases, it may be deemed appropriate by the relevant steering committee to use a full range of client personal demographic data to construct the SLK (for example, entire name, birth details), then encrypt this using a non-reversible algorithm and link the data using deterministic methods.
However, reversible encryption algorithms are generally more commonly understood and allow the use of full demographic data by the receiving agency (that is, the linkage agency) to perform the linkage using probabilistic linkage methods. Thus, the draft protocol described in Section 7.3 proposes use of a reversible encryption algorithm. The SLKWG recommends that some form of encryption be used in any linkage process, to ensure that the maximum security is afforded to data items in transmission and, through the use of a reversible algorithm, that the conditions supporting the best Statistical Data Linkage in Community Services Data Collections
possible linkage are provided.