http://www.dontknow.de
http://www.dontknow.de

Publications

  1. Michael Klemm and Christopher Dahnken. Co-scheduling of HPC Applications, chapter Recent Processor Technologies and Co-Scheduling. January 2017. ISBN 978-1-61499-729-0 (paperback), ISBN 978-1-61499-730-6 (electronic).

  2. Michael Klemm, Alejandro Duran, Ravi Narayanaswamy, Xinmin Tian, and Terry Wilmarth. The Present and Future of the OpenMP API Specification. Intel Parallel Universe Magazine, 27:5-16, January 2017. Available online at https://software.intel.com/sites/default/files/managed/75/b0/parallel-universe-issue-27.pdf.

  3. Matthias Noack, Florian Wende, Georg Zitzlsberger, Michael Klemm, and Thomas Steinke. KART - A Runtime Compilation Library for Improving HPC Application Performance. Technical report, ZIB, Takustr.7, 14195 Berlin, October 2016. ZIB Report 16-48. Available at https://opus4.kobv.de/opus4-zib/frontdoor/index/index/docId/6073.

  4. Christian Terboven, Jonas Hahnfeld, Xavier Teruel, Sergi Mateo, Alejandro Duran, Michael Klemm, Stephen L. Olivier, and Bronis R. de Supinski. Approaches for Task Affinity in OpenMP. In Naoya Maruyama, Bronis R. de Supinski, and Mohamed Wahib, editors, Proceedings of the 12th International Workshop on OpenMP, pages 102-115, October 2016. LNCS 9903.

  5. Florian Wende, Matthias Noack, Thomas Steinke, Michael Klemm, Chris J. Newburn, and Georg Zitzlsberger. Portable SIMD Performance with OpenMP* 4.x Compiler Directives. In Euro-Par 2016: Parallel Processing, pages 264-277, Grenoble, France, August 2016. LNCS 9833.

  6. Michael Klemm, Freddie Witherden, and Peter Vincent. Using the pyMIC Offload Module in PyFR. In Proceedings of the 8th European Conference on Python in Science, Cambridge, UK, July 2016. arXiv:1607.00844 [cs.MS].

  7. Michael Klemm and Christian Terboven. OpenMP API Version 4.5 - A Standard Evolves. Intel Parallel Universe Magazine, 24:23-31, March 2016. Available online at https://software.intel.com/sites/default/files/managed/65/20/v3-theparalleluniverse-issue-24.pdf.

  8. Michael Klemm and Christian Terboven. OpenMP 4.5: Eine kompakte Übersicht zu den Neuerungen. Heise Developer, November 2015. Available online at http://www.heise.de/developer/artikel/OpenMP-4-5-Eine-kompakte-Uebersicht-zu-den-Neuerungen-3020235.html.

  9. Nishant Agrawal, Paul Edwards, Ambuj Pandey, Michael Klemm, Ravi Ojha, and Rihab Abdul Razak. Performance Evaluation of OpenFOAM with MPI-3 RMA Routines on Intel Xeon Processors and Intel Xeon Phi Coprocessors. Bordeaux, France, September 2015. Poster at Euro-MPI 2015.

  10. Barna L. Bihari, Hansang Bae, James Cownie, Michael Klemm, Christian Terboven, and Lori Diachin. On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms II: User-Guided Speculative Locks. In Christian Terboven, Bronis R. de Supinski, Pablo Reble, and Matthias S. Müller, editors, Proceedings of the 11th International Workshop on OpenMP, pages 133-148, Aachen, Germany, September 2015. LNCS 9342.

  11. Jussi Enkovaara, Michael Klemm, and Freddie Witherden. High Performance Parallelism Pearls Volume Two: Multicore and Many-core Programming Approaches, chapter High Performance Python Offloading, pages 243-269. Morgan Kaufman, Elsevier, Inc., August 2015. ISBN 978-0-12-803819-2.

  12. Gaurav Bansal, Anand Deshpande, Paul Edwards, Alexander Heinecke, Michael Klemm, Dheevatsa Mudigere, Elmoustapha Ould ahmed vall, Mikhail Smelyanskiy, Michael Steyer, Nishant Agrawal, Ravi Ojha, Ambuj Pandey, Rihab Abdul Razak, Juan J. Alonso, Thomas D. Economon, Francisco Palacios, and David Keyes. Accelerating Computational Fluid Dynamics Codes on Multi-/Many-Core Intel Platforms. In 27th International Conference on Parallel Computational Fluid Dynamics, Montreal, Quebec, Canada, May 2015. Extended abstract.

  13. Bernd Hentschel, Jens Henrik Göbbert, Paul Springer, Andrea Schnorr, and Torsten W. Kuhlen. Packet-Oriented Streamline Tracing on Modern SIMD Architectures. In Eurographics Symposium on Parallel Graphics and Visualization, pages 43-52, Cagliari, Sardinia, Italy, May 2015.

  14. Michael Klemm. Berechnung abladen erlaubt. Entwickler, 2015(1):74-79, January/February 2015.

  15. Edoardo Aprà, Jeff R. Hammond, Michael Klemm, and Karol Kowalski. High Performance Parallelism Pearls Volume One: Multicore and Many-core Programming Approaches, chapter NWChem: Quantum Chemistry Simulations at Scale, pages 201-223. Morgan Kaufman, Elsevier, Inc., November 2014. ISBN 978-0-12-802118-7.

  16. Edoardo Aprà, Michael Klemm, and Karol Kowalski. Efficient Implementation of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, November 2014.

  17. Michael Klemm and Jussi Enkovaara. pyMIC: A Python Offload Module for the Intel Xeon Phi Coprocessor. In 4th Workshop on Python for High Performance and Scientific Computing, New Orleans, LA, November 2014. Online at http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/pyhpc2014/submissions/pyhpc2014_submission_8.pdf.

  18. Florian Wende, Michael Klemm, Thomas Steinke, and Alexander Reinefeld. High Performance Parallelism Pearls, chapter Concurrent Kernel Offloading, pages 287-306. Morgan Kaufman, Elsevier, Inc., November 2014. ISBN 978-0-12-802118-7.

  19. Alexander Supalov, Andrey Semin, Michael Klemm, and Christopher Dahnken. Optimizing HPC Applications with Intel® Cluster Tools. Apress Media, October 2014. ISBN 978-1-4302-6496-5 (paperback), ISBN 978-1-4302-6497-2 (electronic).

  20. Hansang Bae, James H. Cownie, Michael Klemm, and Christian Terboven. A User-guided Locking API for the OpenMP* Application Program Interface. In Luiz DeRose, Bronis R. de Supinski, Stephen L. Olivier, Barbara M. Chapman, and Matthias S. Müller, editors, Using and Improving OpenMP for Devices, Tasks, and More, pages 173-186, Salvador, Brazil, September 2014. LNCS 8766.

  21. James H. Cownie, Alejandro Duran, Michael Klemm, and Luke Lin. An OpenMP Timeline. Intel Parallel Universe Magazine, 18:41, June 2014. Available online at https://software.intel.com/sites/default/files/managed/6a/78/parallel_mag_issue18.pdf.

  22. Michael Klemm and Christian Terboven. Full Throttle: OpenMP 4.0. Intel Parallel Universe Magazine, 16:6-16, November 2013. Available online at http://download-software.intel.com/sites/default/files/managed/64/cc/parallel_mag_issue16.pdf.

  23. Xavier Teruel, Michael Klemm, Kelvin Li, Xavier Martorell, Stephen L. Olivier, and Christian Terboven. A Proposal for Task-Generating Loops in OpenMP. In A.P. Rendell et al., editor, Proceedings of the 9th International Workshop on OpenMP, pages 1-14, Canberra, Australia, September 2013. LNCS 8122.

  24. Michael Klemm and Christian Terboven. Gas gegeben - Die wichtigsten Neuerungen von OpenMP 4.0. Heise Developer, July 2013. Available online at http://www.heise.de/developer/artikel/Die-wichtigsten-Neuerungen-von-OpenMP-4-0-1915844.html.

  25. Alexander Heinecke, Dirk Pflüger, Dmitry Budnikov, Michael Klemm, Arik Narkis, Maxim Shevtsov, and Ayal Zaks. Demonstrating Performance Portability of a Custom OpenCL Data Mining Application to the Intel Xeon Phi Coprocessor. In Proceedings of the International Workshop on OpenCL 2013 & 2014, Atlanta, GA, May 2013.

  26. Michael Klemm and Maxim Shetsov. Für alle - Heterogene Programmierung mit OpenCL. iX, 1(2013):145-152, December 2012.

  27. Tim Cramer, Dirk Schmidl, Michael Klemm, and Dieter an Mey. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison. In MARC 2012, pages 38-44, Aachen, Germany, November 2012. Published online at http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:hbz:82-opus-43835.

  28. Michael Klemm. Bremse oder Gaspedal? - Performanceanalysetools in Einsatz. Entwickler, 6(2012):83-89, November/December 2012.

  29. Michael Klemm, Alejandro Duran, Xinmin Tian, Hideki Saito, Diego Caballero, and Xavier Martorell. Extending OpenMP with Vector Constructs for Modern Multicore SIMD Architectures. In B.M. Chapman et al., editor, Proceedings of the 8th International Workshop on OpenMP, pages 59-72, Rome, Italy, June 2012. LNCS 7312.

  30. Hans Pabst, Bev Bachmayer, and Michael Klemm. Performance of a Structure-detecting SpMV using the CSR Matrix Representation. In Proceedings of the 11th Symposium Parallel and Distributed Computing, pages 3-10, Munich, Germany, June 2012.

  31. Alexander Heinecke, Michael Klemm, Hans Pabst, and Dirk Pflüger. Towards High-performance Implementations of a Custom HPC Kernel using Intel® Array Building Blocks. In Facing the Multicore Challenge II, pages 36-47, Karlsruhe, Germany, May 2012. Springer. LNCS 7174.

  32. Alexander Heinecke, Michael Klemm, and Hans-Joachim Bungartz. From GPGPUs to Many-Core: NVIDIA Fermi* and Intel® Many Integrated Core Architecture. Computing in Science and Engineering, 14(2):78-83, March-April 2012.

  33. Alexander Heinecke, Michael Klemm, Dirk Pflüger, Arndt Bode, and Hans-Joachim Bungartz. Extending a Highly Parallel Data Mining Algorithm to the Intel® Many Integrated Core Architecture. In Euro-Par 2011: Parallel Processing Workshops, pages 375-384, Bordeaux, France, August 2011. LNCS 7156.

  34. Alejandro Duran, Roger Ferrer, Michael Klemm, Bronis R. de Supinski, and Eduard Ayguade. A Proposal for User-defined Reductions in OpenMP. In Proceedings of the 6th International Workshop on OpenMP, pages 43-55, Tsukuba, Japan, June 2010. LNCS 6132.

  35. Michael Wong, Michael Klemm, Alejandro Duran, Tim Mattson, Grant Haab, Bronis R. de Supinski, and Andrey Churbanov. Towards an Error Model for OpenMP. In Proceedings of the 6th International Workshop on OpenMP, pages 70-82, Tsukuba, Japan, June 2010. LNCS 6132.

  36. Georg Dotzler, Ronald Veldema, and Michael Klemm. JCudaMP: OpenMP/Java on CUDA. In Victor Pankratius, Michael Philippsen, and ACM/IEEE, editors, Proceedings of the Third International Workshop on Multicore Software Engineering, Cape Town, South Africa, May 2010. No page numbers.

  37. Jean Christophe Beyler, Michael Klemm, Philippe Clauss, and Michael Philippsen. A Meta-predictor Framework for Prefetching in Object-based DSMs. Concurrency and Computation: Practice and Experience, 21(14):1789-1803, September 2009.

  38. Tobias Werth, Tobias Floßmann, Michael Klemm, Dominic Schell, Ulrich Weigand, and Michael Philippsen. Dynamic Code Footprint Optimization for the IBM Cell Broadband Engine. In Adam Porter, Lawrence Votta, and Victor Pankratius, editors, Proceedings of the ICSE Workshop on Multicore Software Engineering, pages 64-72, Vancouver, Canada, May 2009.

  39. Michael Klemm. Reparallelization and Migration of OpenMP Applications in Grid Environments, March 2009. Shaker Verlag, Aachen. Doctoral Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.

  40. Michael Klemm, Matthias Bezold, Stefan Gabriel, Ronald Veldema, and Michael Philippsen. Reparallelization Techniques for Migrating OpenMP Codes in Computational Grids. Concurrency and Computation: Practice and Experience, 21(3):281-299, March 2009.

  41. Michael Klemm, Ronald Veldema, and Michael Philippsen. An Automatic Cost-based Framework for Seamless Application Migration in Grid Environments. In Teofilo Gonzalez, editor, Proceedings of the 20th IASTED International Conference on Parallel and Distributed Computing and Systems, pages 219-224, November 2008.

  42. Jean Christophe Beyler, Michael Klemm, Philippe Clauss, and Michael Philippsen. Automatic Prefetching with Binary Code Rewriting in Object-based DSMs. In Emilio Luque, Tomàs Margalef, and Domingo Benítez, editors, Proceedings of the Euro-Par 2008 Conference, pages 643-653, Heidelberg, Germany, August 2008. Springer.

  43. Michael Klemm, Ronald Veldema, and Michael Philippsen. Cluster Research at the Programming Systems Group. High Performance Computing at RRZE, pages 30-31, 2008.

  44. Michael Klemm, Jean Christophe Beyler, Ronny T. Lampert, Michael Philippsen, and Philippe Clauss. Esodyp+: Prefetching in the Jackal Software DSM. In Anne-Marie Kermarrec, Luc Bougé, and Thierry Priol, editors, Proceedings of the Euro-Par 2007 Conference, pages 563-573, New York, NY, USA, August 2007. Springer.

  45. Michael Klemm, Matthias Bezold, Stefan Gabriel, Ronald Veldema, and Michael Philippsen. Reparallelization and Migration of OpenMP Programs. In Bruno Schulze, Rajkuma Buyya, Philippe Navaux, Walfredo Cirne, and Vinod Rebello, editors, Proceedings of the 7th International Symposium on Cluster Computing and the Grid, pages 529-537, New York, NY, USA, May 2007. IEEE Computer Society.

  46. Michael Klemm and Michael Philippsen. Reparallelisierung und Migration von OpenMP-Applikationen. In Gesellschaft für Informatik e.V., editor, Parallel-Algorithmen und Rechnerstrukturen, number 24 in Mitteilungen, pages 65-76, May 2007.

  47. Michael Klemm, Matthias Bezold, Ronald Veldema, and Michael Philippsen. JaMP: An Implementation of OpenMP for a Java DSM. Concurrency and Computation: Practice and Experience, 18(19):2333-2352, April 2007.

  48. Michael Klemm, Ronald Veldema, Matthias Bezold, and Michael Philippsen. A Proposal for OpenMP for Java. In Université de Reims, editor, Proceedings of the 2nd International Workshop on OpenMP, June 2006. Published on CD-ROM.

  49. Michael Klemm, Matthias Bezold, Ronald Veldema, and Michael Philippsen. JaMP: An Implementation of OpenMP for a Java DSM. In Manuel Arenaz, Ramón Doallo, Basilio B. Fraguela, and Juan Touriño, editors, Proceedings of the 12th Workshop on Compilers for Parallel Computers, pages 242-255, January 2006.

  50. Michael Klemm, Ronald Veldema, and Michael Philippsen. Latency Reduction in Software-DSMs by Means of Dynamic Function Splicing. In Teofilo Gonzales and IASTED, editors, Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems, pages 362-367, Cambridge, MA, USA, November 2004.

Contact

e-mail: michael@dontknow.de

 

Twitter: @mj_klemm

 

LinkedIn: http://bit.ly/mklemm

Druckversion Druckversion | Sitemap
© Dr.-Ing. Michael Klemm