查看论文信息

免费浏览

查看论文信息

中文题名：	雇员访问权限自动配置模型设计与实现
姓名：	夏雄
学号：	1049721303141
保密级别：	公开
论文语种：	chi
学科代码：	081001
学科名称：	通信与信息系统
学生类型：	硕士
学位：	工程硕士
学校：	武汉理工大学
院系：	信息工程学院
专业：	电子与通信工程
研究方向：	机器学习软件
第一导师姓名：	苏杨
第一导师院系：	武汉理工大学
完成日期：	2015-04-09
答辩日期：	2015-05-17
中文关键词：	集成学习 ; 前向贪婪选择 ; Stacked generation ; 投票表决
中文摘要：	︿目前主要的访问权限控制机制为：DAC(Discretionary Access Control)、MAC(Mandatory Access Control)、RBAC(Role- based Access Control)。本文旨在研究运用机器学习方法，建立一个雇员访问权限自动化配置的模型。本文由原始的数据集生成了新的数据集、特征集，介绍了几种机器学习算法：逻辑回归、梯度提升决策树、随机森林。用上述三种算法在数据集、特征集组合成的训练集上训练产生了很多分类器模型。最后在上述几种典型分类器模型的基础上，研究了两种常用的集成学习算法，并用两种集成学习算法集成了上述几种分类器模型。具体来说，本文的工作主要体现在以下几个方面：（1）雇员权限数据收集和分析，雇员被授予权限分类标识为1，雇员被拒绝分类标识为0。在原始数据集的基础上，生成了4个新的数据集、5个新的特征集。尤其是在产生数据集时，本文2.1.2小节中利用贪婪前向选择的最优集选择算法从繁杂的数据集合中选择了最优子集，本文中称之为greedy数据集。（2）首先本文用三种算法在原始数据集上训练，预测原始数据集上的表现，然后分别在组合的训练集上训练，最终选择了14个典型分类器模型（五个逻辑回归模型、四个梯度提升决策树模型、五个随机森林模型）。并用上述三个算法在三个数据集上分别训练，比较了各个算法在三个数据集上的表现。逻辑回归在含有greedy数据集的训练集中表现不错，而梯度提升决策树和随机森林在含有tuples数据集的训练集中表现不错。总体上，逻辑回归算法，在训练集上的表现是比较好的。（3）在上述分类模型的基础上，本文介绍了投票表决和stacked generation集成学习算法，对14种典型分类器模型进行集成，堆叠泛化的第二级的分类算法，分别选用岭回归、非负限制的回归系数的线性回归、普通线性回归等算法试验，最后本文选用了岭回归作为二级分类器。最终投票表决集成模型的AUC达到了0.9244，比上述14个分类器模型的最大AUC，提高了0.0048，而stacked generation集成模型的AUC达到了0.9247，提高了0.0051。同原始数据集上三种算法的表现比较，集成模型AUC平均提高了0.05。﹀
参考文献：	︿ [1] 王珏, 石纯一. 机器学习研究[J]. 广西师范大学学报: 自然科学版, 2003, 21(2): 1–15. [2] Wagstaff K, Cardie C, Rogers S. Constrained k-means clustering with background knowledge[C]//ICML. 2001, 1: 577–584. [3] 毛碧波, 孙玉芳. 角色访问控制[J]. 计算机科学, 万方数据资源系统, 2003, 30(1): 121–123. [4] 曾明, 陈立定. 基于树型角色的访问控制策略及其实现[J]. 华南理工大学学报: 自然科学版, 2004, 32(9): 13–16. [5] Ferraiolo D, Kuhn D R, Chandramouli R. Role-based access control[M]. Artech House, 2003. [6] Sandhu R S, Coyne E J, Feinstein H L. Role-based access control models[J]. Computer, IEEE Computer Society, 1996, 29(2): 38–47. [7] Sandhu R, Bhamidipati V, Munawer Q. The ARBAC97 model for role-based administration of roles[J]. ACM Transactions on Information and System Security (TISSEC), ACM, 1999, 2(1): 105–135. [8] Bishop C M, Others. Pattern recognition and machine learning[M]. springer New York, 2006, 4(4). [9] Friedman J H. Greedy function approximation: a gradient boosting machine[J]. Annals of statistics, JSTOR, 2001: 1189–1232. [10] Friedman J H. Stochastic gradient boosting[J]. Computational Statistics & Data Analysis, Elsevier, 2002, 38(4): 367–378. [11] Breiman L. Random forests[J]. Machine learning, Springer, 2001, 45(1): 5–32. [12] WOLPERT D H. Stacked generalization[J]. Neural networks, Elsevier, 1992, 5(2): 241–259. [13] Adler A, Mayhew M J, Cleveland J. Using Machine Learning for Behavior-Based Access Control: Scalable Anomaly Detection on TCP Connections and HTTP Requests[C]//Military Communications Conference, MILCOM 2013-2013 IEEE. 2013: 1880–1887. [14] 邵奇峰, 韩玉民, 郑秋生. 一种混合授权的 RBAC 模型及其 UML 建模[J]. 武汉大学学报: 理学版, 2014(5): 419–423. [15] 赵明斌, 姚志强. 基于 RBAC 的云计算访问控制模型[J]. 计算机应用, 2013, 32(A02): 267–270. [16] 陶勇, 汪成亮. 属性 RBAC 策略的 OWL 表示和推理[J]. Computer Engineering and Applications, 2014, 50(19). [17] 邓文洋, 周洲仪, 林思明等. 开放式环境下一种基于信任度的 RBAC 模型[J]. 计算机工程, 2013, 39(2): 112–118. [18] 付彬. 多分类器组合中的基分类器选取方法[D]. 北京交通大学, 2009. [19] 李寿山, 黄居仁. 基于 Stacking 组合分类方法的中文情感分类研究[J]. 中文信息学报, 2010, 24(5): 56–61. [20] 安波. 基于逻辑回归模型的垃圾邮件过滤系统的研究[D]. 哈尔滨工程大学, 2009. [21] 陈爽爽. 基于 Gradient Boosting 算法的癫痫检测[D]. 山东大学, 2013. [22] 计智伟, 胡珉, 尹建新. 特征选择算法综述[J]. 电子设计工程, 2011, 19(9): 46–51. [23] Seeger M, Williams C, Lawrence N. Fast forward selection to speed up sparse Gaussian process regression[C]//Artificial Intelligence and Statistics 9. 2003(EPFL-CONF-161318). [24] Pahikkala T, Airola A, Salakoski T. Speeding up greedy forward selection for regularized least-squares[C]//Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on. 2010: 325–330. [25] Sun P, Yao X. Greedy forward selection algorithms to sparse Gaussian Process Regression[C]//Neural Networks, 2006. IJCNN’06. International Joint Conference on. 2006: 159–165. [26] Pedregosa F, Varoquaux G, Gramfort A. Scikit-learn: Machine learning in Python[J]. The Journal of Machine Learning Research, JMLR. org, 2011, 12: 2825–2830. [27] Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms[J]. Pattern recognition, Elsevier, 1997, 30(7): 1145–1159. [28] Snyman J. Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms[M]. Springer Science & Business Media, 2005, 97. [29] Ryaben’Kii V S, Tsynkov S V. A theoretical introduction to numerical analysis[M]. CRC Press, 2006. [30] Kohavi R, Others. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]//Ijcai. 1995, 14(2): 1137–1145. [31] Breiman L, Friedman J, Stone C J. Classification and regression trees[M]. CRC press, 1984. [32] Schapire R E. The strength of weak learnability[J]. Machine learning, Springer, 1990, 5(2): 197–227. [33] Breiman L. Bagging predictors[J]. Machine learning, Springer, 1996, 24(2): 123–140. [34] Ho T K. The random subspace method for constructing decision forests[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, IEEE, 1998, 20(8): 832–844. [35] Dietterich T G. Ensemble methods in machine learning[G]//Multiple classifier systems. Springer, 2000: 1–15. [36] Kuncheva L I, Rodríguez J J. A weighted voting framework for classifiers ensembles[J]. Knowledge and Information Systems, Springer, 2014, 38(2): 259–275. [37] Ting K M, Witten I H. Stacked Generalization: when does it work?[J]. 1997. [38] Collins M. The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm[J]. 2013: 1–21. [39] Saunders C, Gammerman A, Vovk V. Ridge regression learning algorithm in dual variables[C]//(ICML-1998) Proceedings of the 15th International Conference on Machine Learning. 1998: 515–521. [40] Rosasco L, Vito E, Caponnetto A. Are loss functions all the same?[J]. Neural Computation, MIT Press, 2004, 16(5): 1063–1076. ﹀
中图分类号：	TP393.08
馆藏号：	TP393.08/3141/2015
备注：	403-西院分馆博硕论文库；203-余家头分馆博硕论文库

附件下载