27/11/2022
Preserving privacy is one of the fundamental requirements of firms that share data with their business partners for building advanced data mining models. Firms often aim to protect the disclosure of sensitive knowledge or information discovered during the data mining process. In this study, we investigate the problem of Frequent Itemset Hiding (FIH) which aims to hide sensitive itemset relationships present in a transactional database. We propose a two-stage integer programming model that maximizes the proportion of unaltered transactions in the sanitized database and protects sensitive itemset relationships. The model exploits the concept of transactional equivalence and significantly reduces the size of the FIH problem. In addition, our model enables the identification of solutions with minimal side effects. We conduct an experimental evaluation on both real and synthetic databases to show that our approach is scalable and produces a sanitized database with maximum accuracy. The generated solution is also found to have lower side effects (itemset information loss) compared to other state-of-the-art methods. Our experiments on very large problem instances show problem size reductions of one to three orders of magnitude. The proposed approach is quite attractive and practically useful for solving large-scale FIH problem instances and preserving privacy in increasingly shared and big data-driven organizational environments.