- 无标题文档
查看论文信息

中文题名:

 

深度神经网络数据流图划分与优化策略研究

    

姓名:

 李志成    

学号:

 1049721701635    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 083500    

学科名称:

 软件工程    

学生类型:

 硕士    

学位:

 工学硕士    

学校:

 武汉理工大学    

院系:

 计算机科学与技术学院    

专业:

 软件工程    

研究方向:

 深度学习    

第一导师姓名:

 徐宁    

第一导师院系:

 武汉理工大学    

完成日期:

 2020-03-25    

答辩日期:

 2020-05-21    

中文关键词:

 

深度神经网络 ; 图划分 ; 强化学习 ; 并行计算

    

中文摘要:

近年来,深度神经网络在众多行业得到了有效的应用。与此同时,随着神 经网络模型复杂度的增加,其参数量和计算量也变的越来越大。使用多台计算 设备并行计算神经网络是提高其应用时效性的主要手段,故一个理想的并行计 算策略将对神经网络的运算效率产生重要影响。“使用数据流图表示神经网络” 这一思想诞生以后,深度神经网络并行计算策略的搜索可以建模为数据流图的 划分问题。启发式图划分算法和强化学习技术是目前解决神经网络数据流图划 分问题的两种方法,其中,启发式算法的计算速度快,但是划分结果容易受到 初始解的扰动;强化学习的搜索结果较准确,但是其模型的训练成本较高。本 文根据两种方法的特点,将强化学习技术与启发式算法结合,通过对深度神经 网络数据流图的划分,为其生成了一个较理想的并行计算策略。具体工作内容 如下:

(1)根据深度神经网络数据流图的特点,改进了图划分算法 Metis 本文首先对神经网络数据流图的结构特点进行了分析,然后根据 Metis 算 法,为数据流图制定了一个高效的划分方式;在此基础上,本文根据数据流图 在计算框架中的编译特点,改进了 Metis 算法中的权重计算方式,解决了数据流 图中具有内存复用和中间数据的顶点无法参与划分计算的问题。

(2)建立强化学习数学模型,减少 Metis 对初始解的依赖 Metis 是一种启发式算法,它的最终划分结果通过在一组初始分区上进行迭 代计算得到,一组不合适的初始分区会造成最终结果的不理想。本文将强化学 习与 Metis 算法结合,利用强化学习模型为 Metis 生成一个最优的初始分区,使 Metis 在数据流图的划分问题上有了更好的表现。

(3)为文中提出的“强化学习+Metis”模型,设计了一个高效的训练方法 “强化学习+Metis”模型在实际训练时,具有模型参数更新速度慢、收敛效 果不稳定等问题。为此,本文结合 PPO 算法,设计了一个训练策略,使模型在 该策略下具有了更高的训练效率。

参考文献:

[1] 朱虎明,李佩,焦李成,等. 深度神经网络并行化研究综述[J]. 计算机学报,2018,41 (8):1861-1881.

[2] 周志华. 机器学习[M]. 北京:清华大学出版社,2016:97-115.

[3] 张军阳,王慧丽,郭阳,等. 深度学习相关研究综述[J]. 计算机应用研究,2018,35(7): 1921-1936.

[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren. Deep Residual Learning for Image Recognition.[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.

[5] Jeffrey Dean, Greg S Corrado, Rajat Monga. Large Scale Distributed Deep Networks[J]. Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012: 1223-1231.

[6] M. Li, D.G. Andersen, J.W. Park. Scaling Distributed Machine Learning with The Parameter Server[J]. Proceedings of the International Conference on Big Data Science and Computing, 2014: 583-598.

[7] Henggang Cui, James Cipar, Qirong Ho. Exploiting Bounded Staleness to Speed Up Big Data Analytics[J]. Proceedings of the USENIX conference on USENIX Annual Technical Conference, 2014: 37-48.

[8] Jia Yangqing, Shelhamer Evan, Donahue Jeff. Caffe: Convolutional Architecture for Fast Feature Embedding[J]. Proceedings of the 22nd ACM international conference on Multimedia, 2014: 675-678.

[9] Abadi M, Agarwal A, Barham p, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems[M]. America: Google, 2016: 1-9. [10] Chen Tianqi, Li Mu. MXNet: A Flexible and Efficient Machine LearningLibrary for Heterogeneous Distributed Systems[C]. In Neural Information Processing Systems, 2015.

[11] Ronan Collobert, Samy Bengio, Johnny Marithoz. Torch: A Modular Machine Learning Software Library[J]. IDIAP Research Insitute, 2002: 02-46.

[12] Noam Shazeer, Youlong Cheng, Niki Parmar, et al. Mesh-TensorFlow: Deep Learning for Supercomputers[C]. 32nd Conference on Neural Information Processing Systems, Canada, 2018.

[13] Alex Krizhevsky. One Weird Trick for Parallelizing Convolutional Neural Networks[J]. ArXiv preprint: 1404.5997, 2019.

[14] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals, et al. Recurrent Neural Network Regularization[J]. ArXiv preprint: 1409.2329, 2014.

[15] Yonghui Wu, Mike Schuster, Zhifeng Chen. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation[J]. ArXiv preprint: 1609.08144, 2016.

[16] Yuexuan Tu, Saad Sadiq, Yudong Tao. A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices[C]. IEEE 20th International Conference on Information Reuse and Integration for Data Science, 2019.

[17] Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar. Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training[J]. ArXiv preprint: 1907.13257, 2019.

[18] Aydin Buluc, Henning Meyerhenke, Ilya Safro, et al. Recent Advances in Graph Partitioning[M]. America: Lecture Notes in Computer Science, 2013: 117-158.

[19] Francois Pellegrini. Distillating Knowledge About SCOTCH[C]. In Dagstuhl Seminar Proceedings, 2009.

[20] George Karypis, Vipin Kumar. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs[J]. Journal of Parallel and Distributed computing, 1997, 21(6): 96-129.

[21] C.M. Fiduccia, R.M. Mattheyses. A Linear-Time Heuristic for Improving Network Partitions[C]. 19th Conference on. ACM, 1998: 241-247.

[22] Cédric Chevalier, Fran?ois Pellegrini. Improvement of the Efficiency of Genetic Algorithms for Scalable Parallel Graph Partitioning in a Multi-Level Framework[C]. International Conference on Parallel Processing. Springer-Verlag, 2006: 243-256.

[23] Sanket Tavarageri, Srinivas Sridharan, Bharat Kaul, et al. Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support[J]. ArXiv preprint: 1906.08168, 2019.

[24] B.W.Kernighan, S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs[J]. Bell System Technical Journal, 1970, 49(2): 291-307.

[25] A. Nazi, W. Hang, A. Goldie, et al. GAP: Generalizable approximate graph partitioning framework[J]. ArXiv preprint: 1903.00614, 2019.

[26] Thomas N. Kipf, Max Welling. Semi-Supervised Classification with Graph Convolutional Networks[C]. Proceedings of the International Conference on Learning Representations, 2017.

[27] Thang D. Bui, Sujith Ravi, Vivek Ramavajjala. Neural Graph Machines: Learning Neural Networks Using Graphs[J]. ArXiv preprint: 1703.04818, 2017.

[28] Petar V, Guillem C, Arantxa C. Graph Attention Networks[J]. ArXiv preprint: 1710.10903, 2017.

[29] A. Mirhoseini, H. Pham, Q.V. Le, et al. Device placement optimization with reinforcement learning[C]. Proceedings of the 34th International Conference on Machine Learning, JMLR, 2017: 2430-2439.

[30] V. Mnih, K. Kavukcuoglu, D. Silver, et al. Playing atari with deep reinforcement learning[J]. ArXiv preprint: 1312.5602, 2013.

[31] V. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.

[32] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of the International Conference on Learning Representations, 2015.

[33] I. Sutskever, O. Vinyals, Q.V. Le. Sequence to sequence learning with neural networks[C]. Advances in neural information processing systems, 2014: 3104-3112. [34] P.S. Thomas, E. Brunskill. Policy gradient methods for reinforcement learning with function approximation and action dependent baselines[J]. ArXiv preprint: 1706.06643, 2017.

[35] Azalia Mirhoseini, Anna Goldie, Hieu Pham. A Hierarchical Model For Device Placement[J]. Proceedings of the International Conference on Learning Representations, 2018.

[36] Yuanxiang Gao, Li Chen, Baochun Li. Spotlight: Optimizing device placement for training deep neural networks[C]. Proceedings of the 35th International Conference on Machine Learning, 2018: 1676-1684.

[37] J. Schulman, F. Wolski, P. Dhariwal, et al. Proximal policy optimization algorithms[J]. ArXiv preprint: 1707.06347, 2017.

[38] R. Addanki, S.B. Venkatakrishnan, S. Gupta, et al. Placeto: Learning generalizable device placement algorithms for distributed machine learning[J]. ArXiv preprint: 1906.08879, 2019.

[39] H. Pham, M.Y. Guan, B. Zoph, et al. Efficient neural architecture search via parameter sharing[J]. ArXiv preprint: 1802.03268, 2018.

[40] Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi. GDP: Generalized Device Placement for Dataflow Graphs[C]. Proceedings of the International Conference on Learning Representations, 2019.

[41] Zihang Dai, Zhilin Yang, Yiming Yang. Transformer-xl: Attentive language models beyond a fixed-length context[C]. Annual Meeting of the Association for Computational Linguistics, 2019.

[42] B. Cheung, A. Terekhov, Y. Chen, et al. Superposition of many models into one[C]. Advances in Neural Information Processing Systems, 2019: 10867-10876.

[43] H. Mao, M. Schwarzkopf, S.B. Venkatakrishnan, et al. Learning scheduling algorithms for data processing clusters[M]. Proceedings of the ACM Special Interest Group on Data Communication, 2019: 270-288.

[44] H. Mao, S.B. Venkatakrishnan, M. Schwarzkopf, et al. Variance reduction for reinforcement learning in input-driven environments[J]. ArXiv preprint: 1807.02264, 2018.

[45] A. Paliwal, F. Gimeno, V. Nair, et al. Regal: Transfer learning for fast optimization of computation graphs[J]. ArXiv preprint: 1905.02494, 2019.

[46] J.F. Gon?alves, M.G.C. Resende. Biased random-key genetic algorithms for combinatorial optimization[J]. Journal of Heuristics, 2011, 17(5): 487-525.

[47] Zhihao Jia, Sina Lin, Charles R. Qi. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks[J]. Proceedings of the 35 th International Conference on Machine Learning, 2018.

[48] Jia Z, Zaharia M, Aiken A. Beyond data and model parallelism for deep neural networks[J]. ArXiv preprint: 1807.05358, 2018..

[49] Wang M, Huang C, Li J. Supporting very large models using automatic dataflow graph partitioning[C]. Proceedings of the Fourteenth EuroSys Conference 2019. 2019: 1-17.

[50] Ragan-Kelley J, Barnes C, Adams A, et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines[J]. Acm Sigplan Notices, 2013, 48(6): 519-530.

[51] Narayanan D, Harlap A, Phanishayee A, et al. PipeDream: generalized pipeline parallelism for DNN training[C]. Proceedings of the 27th ACM Symposium on Operating Systems Principles. 2019: 1-15.

[52] Wang M, Huang C, Li J. Unifying data, model and hybrid parallelism in deep learning via tensor tiling[J]. ArXiv preprint: 1805.04170, 2018.

[53] Huang Y, Cheng Y, Bapna A, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism[C]. Advances in Neural Information Processing Systems. 2019: 103-112.

中图分类号:

 TP311.13    

馆藏号:

 TP311.13/1635/2020    

备注:

 403-西院分馆博硕论文库;203-余家头分馆博硕论文库    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式