查看论文信息

免费浏览

查看论文信息

中文题名：	深度神经网络数据流图划分与优化策略研究
姓名：	李志成
学号：	1049721701635
保密级别：	公开
论文语种：	chi
学科代码：	083500
学科名称：	软件工程
学生类型：	硕士
学位：	工学硕士
学校：	武汉理工大学
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	深度学习
第一导师姓名：	徐宁
第一导师院系：	武汉理工大学
完成日期：	2020-03-25
答辩日期：	2020-05-21
中文关键词：	深度神经网络 ; 图划分 ; 强化学习 ; 并行计算
中文摘要：	︿近年来，深度神经网络在众多行业得到了有效的应用。与此同时，随着神经网络模型复杂度的增加，其参数量和计算量也变的越来越大。使用多台计算设备并行计算神经网络是提高其应用时效性的主要手段，故一个理想的并行计算策略将对神经网络的运算效率产生重要影响。“使用数据流图表示神经网络” 这一思想诞生以后，深度神经网络并行计算策略的搜索可以建模为数据流图的划分问题。启发式图划分算法和强化学习技术是目前解决神经网络数据流图划分问题的两种方法，其中，启发式算法的计算速度快，但是划分结果容易受到初始解的扰动；强化学习的搜索结果较准确，但是其模型的训练成本较高。本文根据两种方法的特点，将强化学习技术与启发式算法结合，通过对深度神经网络数据流图的划分，为其生成了一个较理想的并行计算策略。具体工作内容如下：（1）根据深度神经网络数据流图的特点，改进了图划分算法 Metis 本文首先对神经网络数据流图的结构特点进行了分析，然后根据 Metis 算法，为数据流图制定了一个高效的划分方式；在此基础上，本文根据数据流图在计算框架中的编译特点，改进了 Metis 算法中的权重计算方式，解决了数据流图中具有内存复用和中间数据的顶点无法参与划分计算的问题。（2）建立强化学习数学模型，减少 Metis 对初始解的依赖 Metis 是一种启发式算法，它的最终划分结果通过在一组初始分区上进行迭代计算得到，一组不合适的初始分区会造成最终结果的不理想。本文将强化学习与 Metis 算法结合，利用强化学习模型为 Metis 生成一个最优的初始分区，使 Metis 在数据流图的划分问题上有了更好的表现。（3）为文中提出的“强化学习+Metis”模型，设计了一个高效的训练方法 “强化学习+Metis”模型在实际训练时，具有模型参数更新速度慢、收敛效果不稳定等问题。为此，本文结合 PPO 算法，设计了一个训练策略，使模型在该策略下具有了更高的训练效率。﹀
参考文献：	︿ [1] 朱虎明，李佩，焦李成，等. 深度神经网络并行化研究综述[J]. 计算机学报，2018,41 （8）：1861-1881. [2] 周志华. 机器学习[M]. 北京：清华大学出版社，2016：97-115. [3] 张军阳，王慧丽，郭阳，等. 深度学习相关研究综述[J]. 计算机应用研究，2018,35（7）： 1921-1936. [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren. Deep Residual Learning for Image Recognition.[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778. [5] Jeffrey Dean, Greg S Corrado, Rajat Monga. Large Scale Distributed Deep Networks[J]. Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012: 1223-1231. [6] M. Li, D.G. Andersen, J.W. Park. Scaling Distributed Machine Learning with The Parameter Server[J]. Proceedings of the International Conference on Big Data Science and Computing, 2014: 583-598. [7] Henggang Cui, James Cipar, Qirong Ho. Exploiting Bounded Staleness to Speed Up Big Data Analytics[J]. Proceedings of the USENIX conference on USENIX Annual Technical Conference, 2014: 37-48. [8] Jia Yangqing, Shelhamer Evan, Donahue Jeff. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Proceedings of the 22nd ACM international conference on Multimedia, 2014: 675-678. [9] Abadi M, Agarwal A, Barham p, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[M]. America: Google, 2016: 1-9. [10] Chen Tianqi, Li Mu. MXNet: A Flexible and Efficient Machine LearningLibrary for Heterogeneous Distributed Systems[C]. In Neural Information Processing Systems, 2015. [11] Ronan Collobert, Samy Bengio, Johnny Marithoz. Torch: A Modular Machine Learning Software Library[J]. IDIAP Research Insitute, 2002: 02-46. [12] Noam Shazeer, Youlong Cheng, Niki Parmar, et al. Mesh-TensorFlow: Deep Learning for Supercomputers[C]. 32nd Conference on Neural Information Processing Systems, Canada, 2018. [13] Alex Krizhevsky. One Weird Trick for Parallelizing Convolutional Neural Networks[J]. ArXiv preprint: 1404.5997, 2019. [14] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals, et al. Recurrent Neural Network Regularization[J]. ArXiv preprint: 1409.2329, 2014. [15] Yonghui Wu, Mike Schuster, Zhifeng Chen. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation[J]. ArXiv preprint: 1609.08144, 2016. [16] Yuexuan Tu, Saad Sadiq, Yudong Tao. A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices[C]. IEEE 20th International Conference on Information Reuse and Integration for Data Science, 2019. [17] Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar. Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training[J]. ArXiv preprint: 1907.13257, 2019. [18] Aydin Buluc, Henning Meyerhenke, Ilya Safro, et al. Recent Advances in Graph Partitioning[M]. America: Lecture Notes in Computer Science, 2013: 117-158. [19] Francois Pellegrini. Distillating Knowledge About SCOTCH[C]. In Dagstuhl Seminar Proceedings, 2009. [20] George Karypis, Vipin Kumar. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs[J]. Journal of Parallel and Distributed computing, 1997, 21(6): 96-129. [21] C.M. Fiduccia, R.M. Mattheyses. A Linear-Time Heuristic for Improving Network Partitions[C]. 19th Conference on. ACM, 1998: 241-247. [22] Cédric Chevalier, Fran?ois Pellegrini. Improvement of the Efficiency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-Level Framework[C]. International Conference on Parallel Processing. Springer-Verlag, 2006: 243-256. [23] Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul, et al. Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support[J]. ArXiv preprint: 1906.08168, 2019. [24] B.W.Kernighan, S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs[J]. Bell System Technical Journal, 1970, 49(2): 291-307. [25] A. Nazi, W. Hang, A. Goldie, et al. GAP: Generalizable approximate graph partitioning framework[J]. ArXiv preprint: 1903.00614, 2019. [26] Thomas N. Kipf, Max Welling. Semi-Supervised Classification with Graph Convolutional Networks[C]. Proceedings of the International Conference on Learning Representations, 2017. [27] Thang D. Bui, Sujith Ravi, Vivek Ramavajjala. Neural Graph Machines: Learning Neural Networks Using Graphs[J]. ArXiv preprint: 1703.04818, 2017. [28] Petar V, Guillem C, Arantxa C. Graph Attention Networks[J]. ArXiv preprint: 1710.10903, 2017. [29] A. Mirhoseini, H. Pham, Q.V. Le, et al. Device placement optimization with reinforcement learning[C]. Proceedings of the 34th International Conference on Machine Learning, JMLR, 2017: 2430-2439. [30] V. Mnih, K. Kavukcuoglu, D. Silver, et al. Playing atari with deep reinforcement learning[J]. ArXiv preprint: 1312.5602, 2013. [31] V. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. [32] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of the International Conference on Learning Representations, 2015. [33] I. Sutskever, O. Vinyals, Q.V. Le. Sequence to sequence learning with neural networks[C]. Advances in neural information processing systems, 2014: 3104-3112. [34] P.S. Thomas, E. Brunskill. Policy gradient methods for reinforcement learning with function approximation and action dependent baselines[J]. ArXiv preprint: 1706.06643, 2017. [35] Azalia Mirhoseini, Anna Goldie, Hieu Pham. A Hierarchical Model For Device Placement[J]. Proceedings of the International Conference on Learning Representations, 2018. [36] Yuanxiang Gao, Li Chen, Baochun Li. Spotlight: Optimizing device placement for training deep neural networks[C]. Proceedings of the 35th International Conference on Machine Learning, 2018: 1676-1684. [37] J. Schulman, F. Wolski, P. Dhariwal, et al. Proximal policy optimization algorithms[J]. ArXiv preprint: 1707.06347, 2017. [38] R. Addanki, S.B. Venkatakrishnan, S. Gupta, et al. Placeto: Learning generalizable device placement algorithms for distributed machine learning[J]. ArXiv preprint: 1906.08879, 2019. [39] H. Pham, M.Y. Guan, B. Zoph, et al. Efficient neural architecture search via parameter sharing[J]. ArXiv preprint: 1802.03268, 2018. [40] Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi. GDP: Generalized Device Placement for Dataflow Graphs[C]. Proceedings of the International Conference on Learning Representations, 2019. [41] Zihang Dai, Zhilin Yang, Yiming Yang. Transformer-xl: Attentive language models beyond a fixed-length context[C]. Annual Meeting of the Association for Computational Linguistics, 2019. [42] B. Cheung, A. Terekhov, Y. Chen, et al. Superposition of many models into one[C]. Advances in Neural Information Processing Systems, 2019: 10867-10876. [43] H. Mao, M. Schwarzkopf, S.B. Venkatakrishnan, et al. Learning scheduling algorithms for data processing clusters[M]. Proceedings of the ACM Special Interest Group on Data Communication, 2019: 270-288. [44] H. Mao, S.B. Venkatakrishnan, M. Schwarzkopf, et al. Variance reduction for reinforcement learning in input-driven environments[J]. ArXiv preprint: 1807.02264, 2018. [45] A. Paliwal, F. Gimeno, V. Nair, et al. Regal: Transfer learning for fast optimization of computation graphs[J]. ArXiv preprint: 1905.02494, 2019. [46] J.F. Gon?alves, M.G.C. Resende. Biased random-key genetic algorithms for combinatorial optimization[J]. Journal of Heuristics, 2011, 17(5): 487-525. [47] Zhihao Jia, Sina Lin, Charles R. Qi. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks[J]. Proceedings of the 35 th International Conference on Machine Learning, 2018. [48] Jia Z, Zaharia M, Aiken A. Beyond data and model parallelism for deep neural networks[J]. ArXiv preprint: 1807.05358, 2018.. [49] Wang M, Huang C, Li J. Supporting very large models using automatic dataflow graph partitioning[C]. Proceedings of the Fourteenth EuroSys Conference 2019. 2019: 1-17. [50] Ragan-Kelley J, Barnes C, Adams A, et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines[J]. Acm Sigplan Notices, 2013, 48(6): 519-530. [51] Narayanan D, Harlap A, Phanishayee A, et al. PipeDream: generalized pipeline parallelism for DNN training[C]. Proceedings of the 27th ACM Symposium on Operating Systems Principles. 2019: 1-15. [52] Wang M, Huang C, Li J. Unifying data, model and hybrid parallelism in deep learning via tensor tiling[J]. ArXiv preprint: 1805.04170, 2018. [53] Huang Y, Cheng Y, Bapna A, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism[C]. Advances in Neural Information Processing Systems. 2019: 103-112. ﹀
中图分类号：	TP311.13
馆藏号：	TP311.13/1635/2020
备注：	403-西院分馆博硕论文库；203-余家头分馆博硕论文库

附件下载