HPC en simulación y control a gran escala

Peter Benner; Pablo Ezzatti; Hermann Mena; Enrique S. Quintana–Ortí; Alfredo Remón

doi:10.15765/e.v3i3.412

Vol. 3 Núm. 3 (2013), Artículos (Full Paper)

Vol. 3 Núm. 3 (2013)

HPC en simulación y control a gran escala

Artículos (Full Paper)

Publicado 2013-09-04

Peter Benner⁺⁻
Pablo Ezzatti⁺⁻
Hermann Mena⁺⁻
Enrique S. Quintana–Ortí⁺⁻
Alfredo Remón⁺⁻

Peter Benner

Max Planck Institute for Dynamics of Complex Technical Systems

Pablo Ezzatti

Universidad de la República

Hermann Mena

University of Innsbruck

Enrique S. Quintana–Ortí

Universidad Jaime I

Alfredo Remón

Universidad Jaime I

PDF

Palabras clave

simulación a gran escala
reducción de modelos
truncamiento

Cómo citar

HPC en simulación y control a gran escala. (2013). Elementos, 3(3). https://doi.org/10.15765/e.v3i3.412

Resumen

La simulación y control de fenómenos que aparecen en microelectrónica, micro-mecánica, electromagnetismo, dinámica de ﬂuidos y en general en muchos procesos industriales, constituye un problema difícil de resolver, debido principalmente al elevado costo computacional de los algoritmos para este propósito. Gran parte de los modelos matemáticos que describen estos fenómenos poseen dimensión grande; por ejemplo, la modelización de microprocesadores desemboca en un sistema dinámico a gran escala que no puede ser resuelto con métodos numéricos tradicionales.

En su defecto, son necesarias e incluso obligatorias varias técnicas computacionales de alto desempeño (high performance computing, HPC) para enfrentar este tipo de problemas. En el presente artículo revisamos herramientas de HPC que permiten simular y controlar problemas a gran escala. Concretamente, nos centramos en técnicas para la reducción de modelos vía truncamiento balanceado y la resolución de problemas de control lineal cuadrático, que pueden ser implementadas eﬁcientemente en plataformas multi-núcleo con memoria compartida que, además, utilizan uno o más procesadores gráﬁcos (GPUs).

PDF

Referencias

Repositorio Netlib. www.netlib.org/. Consultado en octubre (2011)

Sitio Web oﬁcial de la biblioteca SLICOT www.slicot.org/

Alfaro, P., Igounet, P, and Ezzatti, P.: Resolución de matrices tri-diagonales utili zando una tarjeta gráﬁca (GPU) de escritorio. Mecánica Computacional, 30 (2010)2951–2967

Antoulas A.C.: Lectures on the approximation of linear dynamical systems. Encyclopedia of Electrical and Electronics Engineering. John Wiley and Sons (1999) 403–422

Antoulas, A. C., Sorensen, D. C., and Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemporary Mathematics, 280 (2001) 193–219 6. Anzt, H., Rocker, B. and Heuveline, V.: Energy eﬃciency of mixed precision iterative reﬁnement methods using hybrid hardware platforms - An evaluation of diﬀerent solver and hardware conﬁgurations. Computer Science - R & D, 25 (2010) 141–148.

Baboulin, M., Dongarra, J. and Tomov, S.: Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures. Manchester Institute for Mathematical Sciences, University of Manchester, Manchester, UK, jan (2009)

Bajaj, C., Ihm, I., and Min, J. and Oh, J.: SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. Computer Graphics Forum, 23 (2004) 697–714

Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S.: Solving Dense Linear Systems on Graphics Processors. in Euro-Par ’08: Proceedings of the 14th international Euro-Par conference on Parallel Processing, Berlin, Heidelberg, Springer-Verlag, (2008) 739–748

Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors.

Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaime I, Campus de Riu Sec, s/n 12.071 - Castellón, España, (2008)

Barrachina, S., Castillo, M., Igual, F. D., Mayo R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Exploiting the capabilities of modern GPUs for dense matrix computations, Concurrency and Computation: Practice and Experience, 21 (2009) 2457-2477

Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. 2nd Edition, SIAM,Philadelphia,PA, (1994)

Baskaran, M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs, IBM Research Report 24704 (2009).

Bell, N., Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, New York, NY, USA, ACM, (2009) 18:1–18:11

Benner, P.: Solving large-scale control problems. IEEE Control Systems Magazine, 14(1) (2004) 44–59

Benner, P.: System-theoretic methods for model reduction of large-scale systems: Simulation, control, and inverse problems. Proceedings of MathMod 2009, Vienna,

February 11-13, 2009, I. Troch and F. Breitenecker, eds., vol. 35 of ARGESIM Reports, (2009) 126–145

Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S., Remón, A.: Accelerating model reduction of large linear systems with graphics processors. In Lecture Notes in Computer Science, State of the Art in Scientiﬁc and Parallel Computing, Springer, (2010)

Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: A mixedprecision algorithm for the solution of Lyapunov equations on hybrid CPU- GPU platforms. Parallel Computing, 37 (2011) 439–450

Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving diﬀerential Riccati equations on multi-GPU platforms. In 2nd Meeting on Linear Algebra, Matrix Analysis and Applications ALAMA10, (2010)

Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving diﬀerential

Riccati equations on multi-GPU platforms. In 10th International Conference on

Computational and Mathematical Methods in Science and Engineering CMMSE11,

(2011) 178–188

Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Using hybrid

CPU-GPU platforms to accelerate the computation of the matrix sign function. In

Euro-Par Workshops, H.-X. Lin, M. Alexander, M. Forsell, A. Knüpfer, R. Prodan,

L. Sousa, and A. Streit, eds., vol. 6043 of Lecture Notes in Computer Science,

Springer, (2009) 132–139

Benner, P., Ezzatti, P., Kressner ,D., Quintana-Ortí, E. S. , Remón, A.: Accelerating

BST methods for model reduction with graphics processors. In Proceedings of the

th International Conference on Parallel Processing and Applied Mathematics,

(2011)

Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Hing

performance matrix inversion of SPD matrices on graphics processors. In Workshop

on Exploitation of Hardware Accelerators WEHA 2011, (2011) 640–646

Benner, P., Hinze, M., Ter Maten, J.: Model Reduction for Circuit Simulation. Vol.

of Lecture Notes in Electrical Engineering, Springer-Verlag, Berlin/Heidelberg,

Germany, (2011)

Benner, P., Li, J.-R., Penzl, T.: Numerical solution of large Lyapunov equations,

Riccati equations, and linear-quadratic control problems. Numer. Linear Algebra

Appl., 15 (2008) 755–777

Benner, P., Mayo, R., Quintana-Ortí E. S., Quintana-Ortí, G.: Enhanced services for

remote model reduction of large-scale dense linear systems. In PARA,J. Fagerholm,

J. Haataja, J. Järvinen, M. Lyly, P. Raback , and V. Savolainen, eds., vol. 2367 of

Lecture Notes in Computer Science, Springer, (2002) 329–338

Benner, P., Mehrmann, V., Sima, V., Huﬀel, S. V., Varga, A.: SLICOT -a subroutine

library in systems and control theory. Applied and Computational Control, Signals,

and Circuits, Birkhuser, (1997) 499–539

Benner, P., Mehrmann, V., Sorensen, D.: Dimension Reduction of Large-Scale

Systems. Vol. 45 of Lecture Notes in Computational Science and Engineering.

Springer-Verlag, Berlin/Heidelberg, Germany, (2005)

Benner, P., Mena, H.: BDF methods for large-scale diﬀerential Riccati equations. In

Proc. of Mathematical Theory of Network and Systems, MTNS 2004, B. D. Moor,

B. Motmans, J. Willems, P. V. Dooren, and V. Blondel, eds., (2004)

Benner, P., Quintana-Ortí E. S., Quintana-Ortí, G.: A portable subroutine library

for solving linear control problems on distributed memory computers. In Workshop

on Wide Area Networks and High Performance Computing, London, UK, SpringerVerlag,

(1999) 61–87

Bischof, C.H., Quintana-Ortí, G.: Computing rank-revealing QR factorizations

of dense matrices. ACM Transactions on Mathematical Software, 24(2) (1998)

–253.

Blackford, L. S., Choi, J., Cleary, A., Petitet, A., Whaley, R. C., Demmel, J., Dhillon,

I., Stanley, K., Dongarra,J., Hammarling, S., Henry, G., Walker, D.: ScaLAPACK: a

portable linear algebra library for distributed memory computers - design issues and

performance. In Proceedings of the 1996 ACM/IEEE conference on Supercomputing

(CDROM), Supercomputing -96, Washington, DCUSA, IEEE Computer Society

(1996)

Blanquer, I., Guerrero,D., Hernandez,V., Quintana-Ortí, E. S., Ruiz, P. A.: ParallelSLICOT

implementation and documentation standards. Tech. rep., SLICOT Working

Note (1998)

Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the GPU:

conjugate gradients and multigrid. ACM Trans. Graph., 22 (2003) 917–924

Buatois, L., Caumon, G., Levy, B.: Concurrent number cruncher: An eﬃcient sparse

linear solver on the GPU. In High Performance Computation Conference (HPCC),

Springer Lecture Notes in Computer Sciences, (2007). Award: Second best student

paper.

Chien, L. S.: Hand Tuned SGEMM on GT200 GPU. Tech. rep., Department of

Mathematics, Tsing Hua University, Taiwan, Feb. (2010)

Choi, J., Dongarra, J., Walker,D.: PB-BLAS: A set of parallel block basic linear

algebra subprograms. In Proc. of the 1994 Scalable High Performance Computing

Conference, IEEE Computer Society Press, (1994)

Christen, M., Schenk, O., Burkhart, H.: General-purpose sparse matrix building

blocks using the NVIDIA CUDA technology platform. Tech. rep., (2007)

Cong, J., Shinnerl, J. R., Xie,M., Kong,T., Yuan, X.: Large-scale circuit placement.

ACM Trans. Des. Autom. Electron. Syst., 10 (2005) 389–430.

Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling,S., Sorensen,D.:

Prospectus for the development of a linear algebra library for high-performance

computers. Tech. Rep. ANL/MCS-TM-97, 9700 South Cass Avenue, Argonne, IL

-4801, USA, (1987)

Eppler, K., Tröltzsch, F.: Discrete and continuous optimal control strategies in the

selective cooling of steel proﬁles., Z. Angew. Math. Mech., 81 (2001) 247–248

Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Eﬀcient model order reduction of

large-scale systems on multi-core platforms. In ICCSA (5), B. Murgante, O. Gervasi,

A. Iglesias, D. Taniar, and B. O. Apduhan, eds., vol. 6786 of Lecture Notes in

Computer Science, Springer, (2011) 643–653

Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: High performance matrix inversion

on a multi-core platform with several GPUs. IEEE Computer Society, (2011) 87–93

Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Using graphics processors to accelerate

the computation of the matrix inverse. The Journal of Supercomputing, online

(2011).

Fatica, M.: Accelerating LINPACK with CUDA on heterogenous clusters. In GPGPU,

(2009) 46–51

Gaikwad, A., Toke, I. M.: Gpu based sparse grid technique for solving multidimensional

options pricing pdes. In Proceedings of the 2nd Workshop on High

Performance Computational Finance, WHPCF -09, New York, NY, USA, ACM,

(2009) 6:1–6:9

Galiano V., Martín A., Migallón, H. Migallón, V. Penadés, J., Quintana-Ortí, E.S.:

PyPLiC: A high-level interface to the parallel model reduction library PLiCMR.

In Proceedings of the Eleventh International Conference on Civil, Structural and

Environmental Engineering Computing, B. H. V. Topping, ed., Stirlingshire, United

Kingdom, (2007), Civil-Comp Press. paper 62.

Galoppo, N., Govindaraju,N. K., Henson, M., Manocha,D.: LU-GPU: Eﬀcient algorithms

for solving dense linear systems on graphics hardware. In SC 05: Proceedings

of the 2005 ACM/IEEE conference on Supercomputing, Washing- ton, DC, USA,

IEEE Computer Society, (2005) 3

Göddeke, D., Strzodka, R.A.: Cyclic reduction tridiagonal solvers on GPUs applied

to mixed precision multigrid. IEEE Transactions on Parallel and Distributed

Systems, doi: 10.1109/TPDS.2010.61, 22 (2011) 22–32

Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid

solver for boundary value problems using programmable graphics hardware. In HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference

on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, Eurographics

Association, (2003) 102–111

Gugercin, S., Sorensen, D., Antoulas, A.: A modiﬁed low-rank Smith method for

large-scale Lyapunov equations. Numer. Algorithms, 32(1) (2003) 27–55

Hall, J., Carr, N., Hart, J.: Cache and bandwidth aware matrix multiplication on

the GPU. Tech. rep., UIUCDCS-R-20032328, University of Illinois, (2003)

Higham,N.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia,

USA, (2008)

Hillesland, K. E., Molinov, S. Grzeszczuk, R.: Nonlinear optimization framework for

image-based modeling on programmable graphics hardware. In ACM SIGGRAPH

Courses, SIGGRAPH ’05, New York, NY, USA, ACM, (2005)

Ino, F., Matsui, M., Goda, K., Hagihara, K.: Performance study of LU decomposition

on the programmable GPU. In HiPC, (2005) 83–94

Iordache, M., Dumitriu, L.: Eﬃcient decomposition techniques for symbolic analysis

of large-scale analog circuits by state variable method. Analog Integr. Circuits

Signal Process., 40 (2004) 235–253

Jung, J. H., O’leary. D.: Exploiting structure of symmetric or triangular matrices on

a GPU. In First Workshop on General Purpose Processing on Graphics Processing

Units, Northeastern Univ., Boston, (2007)

Jung, J. H., O’leary. D.: Implementing an interior point method for linear programs

on a CPU-GPU system. Electronic Transactions on Numerical Analysis

Kamon, M., Tsuk, M., White, J.: Fasthenry: A multipole-accelerated 3-d inductance

extraction program. IEEE Transactions on Microwave Theory and Techniques, 42

(1994) 1750–1758

Kamon, M., Wang, F., White, J.: Generating nearly optimal compact models from

krylov-subspace based reduced order models. IEEE Transactions On Circuits and

Systems-II: Analog and Digital Signal Processing, 47 (2000) 239–248

Kolmogorov, A., Fomin, S. V.: Elements of the Theory of Functions and Functional

Analysis. Dover Publications, (1999)

Krüger, J., Schiwietz, T., Kipfer, P., Westermann, R.: Numerical simulations on

PC graphics hardware. In ParSim 2004 (Special Session of EuroPVM/MPI 2004,

Budapest, Hungary, (2004) 442–450

Krüger, J., Westermann, R.: Linear algebra operators for GPU implementation of

numerical algorithms. ACM Transactions on Graphics, 22 (2003) 908–916

Larsen, E. S., McAllister, D.: Fast matrix multiplies using graphics hardware. In

Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM),

Supercomputing ’01, New York, NY, USA, (2001), ACM, 55–55

Lasiecka, I., Triggiani, R.: Control Theory for Partial Diﬀerential Equations: Continuous

and Approximation Theories I: Abstract Parabolic Systems. Cambridge

University Press, Cambridge, UK, (2000)

Li, J.-R., Kamon, M.: PEEC model of a spiral inductor generated by Fasthenry, in

Dimension Reduction of Large-Scale Systems. P. Benner, V. Mehrmann, and D.

Sorensen, eds., vol. 45 of Lecture Notes in Computational Science and Engineering,

Springer-Verlag, Berlin/Heidelberg, Germany, (2005) 373–377

Li, J.-R., White,J.: Reduction of large circuit models via low rank approximate

gramians. International Journal of Applied Mathematics and Computer Science, 11

(2001) 101–121

Ltaief, H., Tomov, S., Nath, R., Du, P., Dongarra, J.: A scalable high performant

Cholesky factorization for multicore with GPU accelerators. In VECPAR, vol. 6449

of Lecture Notes in Computer Science, Springer, (2010) 93–101

Lucas, R. F., Wagenbreth, G., Davis, D. M., Grimes, R.: Multifrontal computations

on GPUs and their multi-core hosts. In Proceedings of the 9th international

conference on High performance computing for computational science, VECPAR’10,

Berlin, Heidelberg, Springer-Verlag, (2011) 71–82

Maciol, P., Banas K.: Testing tesla architecture for scientiﬁc computing: the performance

of matrix-vector product. vol. 3, (2008)

Mena, H.: Numerical Solution of Diﬀerential Riccati Equations Arising in Optimal

Control Problems for Parabolic Partial Diﬀerential Equations. PhD thesis, Escuela

Politécnica Nacional, Quito, Ecuador, (2007)

Moravanszky., A., Ag., N.: Dense matrix algebra on the GPU. In Direct3D ShaderX2,

Engel W. F., (Ed.). Wordware Publishing, NovodeX AG, (2003) 2

Nath, R., Tomov, S., Dongarra, J.: BLAS for GPUs. In Scientiﬁc Computing with

Multicore and Accelerators, J. Kurzak, D. A. Bader, and J. a. Dongarra, eds., CRC

Press, Dec. (2010)

Penzl, T.: Lyapack Users Guide. Tech. Rep. SFB393/00-33, Sonderforschungsbereich

Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107

Chemnitz, Germany, (2000). Available from http://www.tu-chemnitz.de/sfb393/

sfb00pr.html.

Penzl, T.: Algorithms for model reduction of large dynamical systems. Linear

Algebra Applications, 415 (2006) 322–343. (Reprint of Technical Report SFB393/9940,

TU Chemnitz, (1999)

Nath, S. T. R., Dongarra, J.: An Improved MAGMA GEMM for Fermi Graphics

Processing Units. International Journal in High Performance Computing and

Architectures, 24 (2010) 511–515

Remón, A., Quintana-Ortí, E., Quintana-Ortí, G.: Parallel solution of band linear

systems in model reduction. In Parallel Processing and Applied Mathematics, R.

Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., vol. 4967 of

Lecture Notes in Computer Science, Springer Berlin / Heidelberg, (2008) 678–687

Riaza R.: Diﬀerential-Algebraic Systems. Analytical Aspects and Circuit Applications,

World Scientiﬁc, (2008)

Ries, F., De Marco, T., Zivieri, M., Guerrieri, R.: Triangular matrix inversion on graphics processing unit. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, New York, NY, USA, ACM, (2009) 9:1–9:10

Saad,Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd ed., (2003)

Schenk, O. Gärtner, K.: Sparse factorization with two level scheduling in pardiso. In PPSC, (2001)

Sengupta, S., Harris, M., Zhang, Y., Owens, J. D. Scan primitives for GPU computing. In GH ’07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, (2007), Eurographics Association, 97–106

Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In IPDPS Workshops, IEEE, (2010) 1-8

Varga, A.: Task II.B.1 – selection of software for controller reduction. SLICOT Working Note 1999–18, The Working Group on Software (WGS), http://www. slicot.org/index.php?site=SLmodredR, (1999)

Varga, A.: Model reduction software in the SLICOT library. In Applied and Computational Control, Signals, and Circuits, volume 629 of The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, (2000) 239–282

Volkov, V, Demmel, J.: Benchmarking GPUs to tune dense linear algebra. In SC ’08:

Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway,

NJ, USA, (2008), IEEE Press, (2008) 1–11

Wachspress, E.L.: Iterative solution of the Lyapunov matrix equation. Appl. Math.

Letters, 107 (1988) 87–90

Zhang, Y., Cohen, J., Owens, J. D.: Fast tridiagonal solvers on the GPU. In PPOPP,

(2010) 127–136

Los autores/as que publiquen en la Revista ELEMENTOS aceptan las siguientes condiciones:

Los autores/as conservan los derechos de autor y ceden a la revista el derecho de la primera publicación, con el trabajo registrado con Creative Commons: Reconocimiento - No Comercial -Sin Obra Derivada, que permite a terceros utilizar lo publicado siempre que mencionen la autoría del trabajo y a la primera publicación en esta revista.
Los autores/as pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
Se permite y recomienda a los autores/as a publicar su trabajo en Internet (por ejemplo en páginas institucionales o personales) antes y durante el proceso de revisión y publicación, ya que puede conducir a intercambios productivos y a una mayor y más rápida difusión del trabajo publicado.