II2202, Fall 2018, Period 1-2 Draft project report January 15, 2022
[19] P. Bock, Getting it right: R&D methods for science and engineering. Academic Press, 2001.
[20] Y. Liu, Y. Zhou, S. Wen, and C. Tang, “A strategy on selecting performance metrics for classifier
evaluation,” International Journal of Mobile Computing and Multimedia Communications (IJMCMC),
vol. 6, no. 4, pp. 20–35, 2014.
[21] D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http:
//archive.ics.uci.edu/ml
[22] X. Chu, I. F. Ilyas, S. Krishnan, and J. Wang, “Data cleaning: Overview and emerging challenges,” in
Proceedings of the 2016 international conference on management of data, 2016, pp. 2201–2206.
[23] X. Chu, I. F. Ilyas, and P. Papotti, “Discovering denial constraints,” Proceedings of the VLDB
Endowment, vol. 6, no. 13, pp. 1498–1509, 2013.
[24] J. Wang, T. Kraska, M. J. Franklin, and J. Feng, “Crowder: Crowdsourcing entity resolution,” arXiv
preprint arXiv:1208.1927, 2012.
[25] P. Bohannon, W. Fan, M. Flaster, and R. Rastogi, “A cost-based model and effective heuristic for
repairing constraints by value modification,” in Proceedings of the 2005 ACM SIGMOD international
conference on Management of data, 2005, pp. 143–154.
[26] J. Kaiser, “Dealing with missing values in data,” Journal of systems integration, vol. 5, no. 1, pp.
42–51, 2014.
[27] S. Zhang, “Nearest neighbor selection for iteratively knn imputation,” Journal of Systems and
Software, vol. 85, no. 11, pp. 2541–2552, 2012.
[28] M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, 2008.
[29] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “Knn model-based approach in classification,” in OTM
Confederated International Conferences” On the Move to Meaningful Internet Systems”. Springer,
2003, pp. 986–996.
[30] P. H. Swain and H. Hauska, “The decision tree classifier: Design and potential,” IEEE Transactions
on Geoscience Electronics, vol. 15, no. 3, pp. 142–147, 1977.
[31] R. F
´
eraud and F. Cl
´
erot, “A methodology to explain neural network classification,” Neural networks,
vol. 15, no. 2, pp. 237–246, 2002.
[32] Q. Song, H. Jiang, and J. Liu, “Feature selection based on fda and f-score for multi-class
classification,” Expert Systems with Applications, vol. 81, pp. 22–27, 2017.
[33] S. Ding, “Feature selection based f-score and aco algorithm in support vector machine,” in 2009
Second International Symposium on Knowledge Acquisition and Modeling, vol. 1. IEEE, 2009, pp.
19–23.
[34] J. R. Vergara and P. A. Est
´
evez, “A review of feature selection methods based on mutual information,”
Neural computing and applications, vol. 24, no. 1, pp. 175–186, 2014.
[35] S. Ronaghan, “The mathematics of decision trees, random forest and feature importance
in scikit-learn and spark,” Nov 2019. [Online]. Available: https://towardsdatascience.com/
the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark
[36] F. Herrera, C. J. Carmona, P. Gonz
´
alez, and M. J. Del Jesus, “An overview on subgroup discovery:
foundations and applications,” Knowledge and information systems, vol. 29, no. 3, pp. 495–525, 2011.
14