Journal of Software (软件学报) 2013/24:12 PP.2883-2896
To protect privacy against linking attacks, quasi-identifier attributes of microdata should be anonymized in privacy preserving data publishing. Although lots of algorithms have been proposed in this area, few of them can handle incomplete microdata. Most existing algorithms simply delete records with missing values, causing large information loss. This paper proposes a novel data anonymization approach called KAIM (k-anonymity for incomplete microdata), for incomplete microdata based on k-member algorithm and information entropy distance. Instead of deleting any records, KAIM effectively clusters records with similar characteristics together to minimize information loss, and then generalizes all records with local recording scheme. Results of extensive experiments base on real dataset show that KAIM causes only 43.8% information loss compared with previous algorithms for incomplete microdata, validating that KAIM performs much better than existing algorithms on the utility of anonymized dataset.