Despite enormous benefits and the extremely fast proliferation of data mining in recent years, data owners and researchers alike have acknowledged that data mining also revives old and introduces new threats to individual privacy. Many believe that data mining is, and will continue to be, one of the most significant privacy challenges in years to come. We live in an information age where vast amounts of personal data are regularly collected in the process of bank transactions, credit-card payments, making phone calls, using reward cards, visiting doctors and renting videos and cars, to mention but a few examples. All these data are typically used for data mining and statistical analysis and are often sold to other companies and organizations. A breach of privacy occurs when individuals are not aware that the data have been collected in the first place, have been passed onto other companies and organizations, or have been used for purposes other than the one for which they were originally collected. Even when individuals approve of use of their personal records for data mining and statistical analysis, for example in medical research, it is still assumed that only aggregate values will be made available to researchers and that no individual values will be disclosed. Various techniques can be employed in order to ensure the confidentiality of individual records and other sensitive information. They include adding noise to the original data, so that disclosing perturbed data does not necessarily reveal the confidential individual values. Some techniques were developed specifically for mining vertically and/or horizontally partitioned data. In this scenario each partition belongs to a different party (e.g., a hospital), and no party is willing to share their data but they all have interest in mining the total data set comprising all of the partitions. There are other techniques that focus on protecting confidentiality of logic rules and patterns discovered from data. In this chapter we introduce the main issues in privacy-preserving data mining, provide a classification of existing techniques and survey the most important results in this area.
Security, Privacy and Trust in Modern Data Management p. 151-165