DATA BALANCING METHODS BY FUZZY ROUGH SETS

Tran Thanh Huyen

Authors

Tran Thanh Huyen (Corresponding Author) VNU University of Engineering and Technology

Keywords:

Rough Set theory; Fuzzy-rough sets; Granular computing; Imbalanced data; Instance selection.

Abstract

The robustness of rough sets theory in data cleansing have been proved in many studies. Recently, fuzzy rough set also make a deal with imbalanced data by two approaches. The first is a combination of fuzzy rough instance selection and balancing methods. The second tries to use different criteria to clean majorities and minorities classes of imbalanced data. This work is an extension of the second method which was presented in [16]. The paper depicts complete study about the second method with some proposed algorithms. It focuses mainly on binary classification with kNN and SVM for imbalanced data. Experiments and comparisons among related methods will confirm pros and coin of each method with respect to performance accuracy and time consumption.

References

[1]. Jesus Alcala-Fdez, Alberto Fernandez, Julian Luengo, Joaquin Derrac, and Salvador Garcia. Keel data-mining software tool: “Data set repository, integration of algorithms and experimental analysis framework.” Multiple-Valued Logic and Soft Computing, 17(2-3):255–287, 2011.

[2]. K. Bache and M. Lichman. “UCI Machine Learning Repository”, 2013.

[3]. Yaile Caballero, Rafael Bello, Delia Alvarez, Maria M. Gareia, and Yaimara Pizano. “Improving the k-nn method: Rough set in edit training set”. In John Debenham, editor, Professional Practice in Artificial Intelligence, volume 218 of IFIP International Federation for Information Processing, pages 21–30. Springer US, 2006.

[4]. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. “Smote: Synthetic minority over-sampling technique”. Journal of Artificial Intelligence Research, 16:321–357, 2002.

[5]. Chris Cornelis, Nele Verbiest, and Richard Jensen. “Ordered weighted average based fuzzy rough sets”. In Rough Set and Knowledge Technology - 5th International Conference, RSKT 2010, Beijing, China, October 15- 17, 2010. Proceedings, pages 78–85, 2010.

[6]. Corinna Cortes and Vladimir Vapnik. “Support-vector networks”. Machine Learning, 20(3):273–297, 1995.

[7]. T. Cover and P. Hart. “Nearest neighbor pattern classification”. IEEE Trans. Inf. Theor., 13(1):21–27, September 1967.

[8]. Didier Dubois and Henri Prade. “Rough fuzzy sets and fuzzy rough sets”. In International Journal of General Systems, volume 17, pages 191–209. 1990.

[9]. Didier Dubois and Henri Prade. “Putting rough sets and fuzzy sets together. In Roman Slowinski”, editor, Intelligent Decision Support, volume 11 of Theory and Decision Library, pages 203–232. Springer Netherlands, 1992.

[10]. Friedman, M. “The use of ranks to avoid the assumption of normality implicit in the analysis of variance”. Journal of the American Statistical Association, 32(200):675–701, 1973.

[11]. Grzymala-Busse, J. W., Clark, P. G., and Kuehnhausen, M. “Generalized probabilistic approximations of incomplete data”. International Journal of Approximate Reasoning, 55(1, Part 2):180 – 196. Special issue on Decision-Theoretic Rough Sets, 2014.

[12]. Jin Huang and C.X. Ling. “Using auc and accuracy in evaluating learning algorithms”. Knowledge and Data Engineering, IEEE Transactions on, 17(3):299 310, March 2005.

[13]. R. Jensen and C. Cornelis. “Fuzzy-rough instance selection”. In Fuzzy Systems (FUZZ), 2010 IEEE International Conference on, pages 1–7, July 2010.

[14]. Marzena Kryszkiewicz. “Rough set approach to incomplete information systems”. Inf. Sci., 112(1-4):39–49, December 1998.

[15]. Victoria Lopez, Alberto Fernandez, Salvador Garcia, Vasile Palade, and Francisco Herrera. “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics”. Information Sciences, 250(0):113 – 141, 2013.

[16]. Do Van Nguyen, Keisuke Ogawa, Kazunori Matsumoto, and Masayuki Hashimoto. “Editing training sets from imbalanced data using fuzzyrough sets”. In IFIP Advances in Information and Communication Technology, volume 458, pages 115–129, France, 2015.

[17]. Do Van Nguyen, Koichi Yamada, and Muneyuki Unehara. “Extended tolerance relation to define a new rough set model in incomplete information systems”. Advances in Fuzzy Systems, 2013. Article ID 372091.

[18]. Do Van Nguyen, Koichi Yamada, and Muneyuki Unehara. “On probability of matching in probability based rough set definitions”. In IEEESMC2013, pages 449–454, Manchester, The UK, 2013.

[19]. Nguyen, D. V., Yamada, K., and Unehara, M. “Rough set approach with imperfect data based on dempster-shafer theory”. Journal of Advanced Computational Intelligence and Intelligent Informatics, 18(3):280–288, 2014.

[20]. Nguyen, H. S. “Discretization problem for rough sets methods”. In Proceedings of the First International Conference on Rough Sets and Current Trends in Computing, RSCTC ’98, pages 545–552, London, UK, UK. Springer-Verlag, 1998.

[21]. Zdzislaw Pawlak. “Rough sets”. International Journal of Computer and Information Sciences, 11:341–356, 1982.

[22]. Zdzislaw Pawlak. “Rough Sets”. Theoretical Aspects of Reasoning about Data. Kluwer Acad., 1991.

[23]. Anna Maria Radzikowska and Etienne E. Kerre. “A comparative study of fuzzy rough sets”. Fuzzy Sets Syst., 126(2):137–155, March 2002.

[24]. Enislay Ramentol, Yaile Caballero, Rafael Bello, and Francisco Herrera. SMOTE-RSB *: “A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory”. Knowl. Inf. Syst., 33(2):245–265, 2011.

[25]. Enislay Ramentol, Nele Verbiest, Rafael Bello, Yaille Caballero, Chris Cornelis, and Francisco Herrera. Smote-frst: “A new resampling method using fuzzy rough set theory”. In Cengiz Kahraman, Etienne Kerre, and Faik Tunc Bozbura, editors, World Scientific Proceedings Series on Computer Engineering and Decision Making, volume 7, pages 800–805. World Scientific, 2012.

[26]. Enislay Ramentol, Sarah Vluymans, Nele Verbiest, Yaille Caballero, Rafael Bello, Chris Cornelis, and Francisco Herrera. Ifrowann: “Imbalanced fuzzy rough ordered weighted average nearest neighbor classification”. In IEEE Transaction on Fuzzy System, volume 23, 2012.

[27]. Verbiest, N. Multi threshold frps: “A new approach to fuzzy rough set prototype selection”. In RSCTC 2014, LNAI, volume 8536, pages 83–91, 2014.

[28]. Nele Verbiest, Chris Cornelis, and Francisco Herrera. Frps: “A fuzzy rough prototype selection method”. Pattern Recognition, 46(10):2770 – 2782, 2013.

[29]. Nele Verbiest, Enislay Ramentol, Chris Cornelis, and Francisco Herrera. “Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection”. Appl. Soft Comput., 22:511–517, 2014.

[30]. Sarah Vluymans, Danel Sanchez Tarrago, Yvan Saeys, Chris Cornelis, and Francisco Herrera. “Fuzzy rough classifier for class imbalanced multi-instance data”. Pattern Recognition, 53:36–45, 2016.

[31]. Wilcoxon, F. “Individual comparisons by ranking methods”. Biometrics Bulletin, 1(6):80–83, 1945.

[32]. Ronald R. Yager. “On ordered weighted averaging aggregation operators in multicriteria decisionmaking”. IEEE Trans. Syst. Man Cybern., 18(1):183–190, January 1988.

[33]. Y. Y. Yao. “Combination of rough and fuzzy sets based on -level sets”. In Rough sets and data mining: Analysis for imprecise data, pages 301– 321. Kluwer Academic, 1997.

[34]. Hans-Jurgen Zimmermann. “Fuzzy Set Theory and its Applications”. Springer, 2001.

DATA BALANCING METHODS BY FUZZY ROUGH SETS

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

ISSN: 1859-1043

Language

Make a Submission

Indexed by

Information

Visitors

GTM