DOI: 10.3724/SP.J.1219.2013.00765

Information and Control (信息与控制) 2013/42:6 PP.765-771

Multiple Outlier Detection Method for Linear Regression Model and Its Energy Conservation Application

The energy consumption of the office equipment can be described by linear regression model. For the model, a multiple outlier detection algorithm based on single-link hierarchical clustering and LTS (least trimmed squares) estimator is proposed. This method is validated in different types of typical data sets. And the results prove that it has excellent performance. Then it is applied to office equipment energy consumption data sets. The experiments show that it has better ability to deal with the masking and swamping problems than other algorithms. Also, the method can not only correctly identify the outliers but also provide abnormal degree of outliers. The managers can develop the reasonable energy management solutions and achieve the purpose of energy saving.

Key words:single linkage hierarchical clustering algorithm,linear regression model,outlier detection,least trimmed squares estimator,masking problem,swamping problem

ReleaseDate:2015-04-15 18:52:46

[1] Igelnik B. Computational modeling and simulation of intellect: current state and future perspectives[M]. Pennsylvania, USA: IGI-Global, 2011: 510-550.

[2] 孙广山. 线性回归模型影响分析及异常点的统计诊断[D].哈尔滨:东北林业大学,2011. Sun G S. Influence analysis of linear regression model and diagnosis of outliers[D]. Harbin: Northeast Forestry University, 2011.

[3] Robert A. Modern methods for robust regression[M]. Oaks, UK: SAGE Publications, 2008.

[4] Özlem G A. Comparison of robust regression methods in linear regression[J]. International Journal of Contemporary Mathematical Sciences, 2011, 6(9): 409-421.

[5] Hadi A S. Identifying multiple outliers in multivariate data[J]. Journal of the Royal Statistical Society: Series B, 1992, 54(3): 761-771.

[6] Hadi A S. A modification of a method for the detection of outliers in multivariate samples[J]. Journal of the Royal Statistical Society: Series B, 1994, 56(2): 393-396.

[7] Hadi A S, Simonoff J S. Procedures for the identification of multiple outliers in linear models[J]. Journal of the American Statistical Association, 1993, 88(424): 1264-1272.

[8] Hadi A S, Simonoff J S. A more robust outlier identifier for regression data[J]. Bulletin of the International Statistical Institute, 1997, 57(23): 281-282.

[9] Billor N, Hadi A S, Velleman P F. BACON: Blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics & Data Analysis, 2000, 34(3): 279-298.

[10] Pena D, Yohai V. The detection of influential subsets in linear regression by using an influence matrix[J]. Journal of the Royal Statistical Society: Series B, 1995, 57(1): 145-156.

[11] Sebert D M, Montgomery D C, Rollier D. A clustering algorithm for identifying multiple outliers in linear regression[J]. Computational Statistics & Data Analysis, 1998, 27(4): 461-484.

[12] Billor N, Chatterjee S, Hadi A S. A re-weighted least squares method for robust regression estimation[J]. American Journal of Mathematical and Management Sciences, 2006, 26(3): 229-252.

[13] Winskowski W J, Montgomery D C, James R S. A comparative analysis of multiple outlier detection procedures in the linear regression model[J]. Computational Statistics & Data Analysis, 2001, 36(3): 351-382.

[14] Billor N, Kiral G. A Comparison of multiple outlier detection methods for regression data[J]. Communications in Statistics - Simulation and Computation, 2008, 37(3): 521-545.

[15] Seem J E. Using intelligent data analysis to detect abnormal energy consumption in buildings[J]. Energy and Buildings, 2007, 39(1): 52-58.

[16] Jakkula V, Cook D. Outlier detection in smart environment structured power datasets[C]//Proceedings of 2010 6th International Conference on Intelligent Environments. 2010: 29-33.

[17] Li X, Bowers C P, Schnier T. Classification of energy consumption in buildings with outlier detection[J]. IEEE Transactions on Industrial Electronics, 2010, 57(11): 3639-3644.

[18] Chao C, Diane C. Energy outlier detection in smart environments[C]//25th AAAI Conference. 2011: 9-14.

[19] Kim S, Krzanowski W J. Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization[J]. Computational Statistics, 2007, 22(1): 109-119.

[20] 刘丹丹,陈启军,森一之,等.基于数据的建筑能耗分析与建模[J].同济大学学报,2010,38(12):1841-1845. Liu D D, Chen Q J, Mori K, et al. Data-based analysis and modeling of building electricity energy consumption[J]. Journal of Tongji University: Natural Science, 2010, 38(12): 1841-1845.