Resumen
La simulación y control de fenómenos que aparecen en microelectrónica, micro-mecánica, electromagnetismo, dinámica de fluidos y en general en muchos procesos industriales, constituye un problema difícil de resolver, debido principalmente al elevado costo computacional de los algoritmos para este propósito. Gran parte de los modelos matemáticos que describen estos fenómenos poseen dimensión grande; por ejemplo, la modelización de microprocesadores desemboca en un sistema dinámico a gran escala que no puede ser resuelto con métodos numéricos tradicionales.
En su defecto, son necesarias e incluso obligatorias varias técnicas computacionales de alto desempeño (high performance computing, HPC) para enfrentar este tipo de problemas. En el presente artículo revisamos herramientas de HPC que permiten simular y controlar problemas a gran escala. Concretamente, nos centramos en técnicas para la reducción de modelos vía truncamiento balanceado y la resolución de problemas de control lineal cuadrático, que pueden ser implementadas eficientemente en plataformas multi-núcleo con memoria compartida que, además, utilizan uno o más procesadores gráficos (GPUs).
Referencias
Repositorio Netlib. www.netlib.org/. Consultado en octubre (2011)
Sitio Web oficial de la biblioteca SLICOT www.slicot.org/
Alfaro, P., Igounet, P, and Ezzatti, P.: Resolución de matrices tri-diagonales utili zando una tarjeta gráfica (GPU) de escritorio. Mecánica Computacional, 30 (2010)2951–2967
Antoulas A.C.: Lectures on the approximation of linear dynamical systems. Encyclopedia of Electrical and Electronics Engineering. John Wiley and Sons (1999) 403–422
Antoulas, A. C., Sorensen, D. C., and Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemporary Mathematics, 280 (2001) 193–219 6. Anzt, H., Rocker, B. and Heuveline, V.: Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms - An evaluation of different solver and hardware configurations. Computer Science - R & D, 25 (2010) 141–148.
Baboulin, M., Dongarra, J. and Tomov, S.: Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures. Manchester Institute for Mathematical Sciences, University of Manchester, Manchester, UK, jan (2009)
Bajaj, C., Ihm, I., and Min, J. and Oh, J.: SIMD Optimization of Linear Expressions for Programmable Graphics Hardware. Computer Graphics Forum, 23 (2004) 697–714
Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S.: Solving Dense Linear Systems on Graphics Processors. in Euro-Par ’08: Proceedings of the 14th international Euro-Par conference on Parallel Processing, Berlin, Heidelberg, Springer-Verlag, (2008) 739–748
Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Evaluation and Tuning of the Level 3 CUBLAS for Graphics Processors.
Departamento de Ingeniería y Ciencia de Computadores, Universidad Jaime I, Campus de Riu Sec, s/n 12.071 - Castellón, España, (2008)
Barrachina, S., Castillo, M., Igual, F. D., Mayo R., Quintana-Ortí, E. S., QuintanaOrtí, G.: Exploiting the capabilities of modern GPUs for dense matrix computations, Concurrency and Computation: Practice and Experience, 21 (2009) 2457-2477
Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. 2nd Edition, SIAM,Philadelphia,PA, (1994)
Baskaran, M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs, IBM Research Report 24704 (2009).
Bell, N., Garland, M. Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, New York, NY, USA, ACM, (2009) 18:1–18:11
Benner, P.: Solving large-scale control problems. IEEE Control Systems Magazine, 14(1) (2004) 44–59
Benner, P.: System-theoretic methods for model reduction of large-scale systems: Simulation, control, and inverse problems. Proceedings of MathMod 2009, Vienna,
February 11-13, 2009, I. Troch and F. Breitenecker, eds., vol. 35 of ARGESIM Reports, (2009) 126–145
Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S., Remón, A.: Accelerating model reduction of large linear systems with graphics processors. In Lecture Notes in Computer Science, State of the Art in Scientific and Parallel Computing, Springer, (2010)
Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: A mixedprecision algorithm for the solution of Lyapunov equations on hybrid CPU- GPU platforms. Parallel Computing, 37 (2011) 439–450
Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving differential Riccati equations on multi-GPU platforms. In 2nd Meeting on Linear Algebra, Matrix Analysis and Applications ALAMA10, (2010)
Benner, P., Ezzatti, P., Mena, H., Quintana-Ortí, E. S. , Remón, A.: Solving differential
Riccati equations on multi-GPU platforms. In 10th International Conference on
Computational and Mathematical Methods in Science and Engineering CMMSE11,
(2011) 178–188
Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Using hybrid
CPU-GPU platforms to accelerate the computation of the matrix sign function. In
Euro-Par Workshops, H.-X. Lin, M. Alexander, M. Forsell, A. Knüpfer, R. Prodan,
L. Sousa, and A. Streit, eds., vol. 6043 of Lecture Notes in Computer Science,
Springer, (2009) 132–139
Benner, P., Ezzatti, P., Kressner ,D., Quintana-Ortí, E. S. , Remón, A.: Accelerating
BST methods for model reduction with graphics processors. In Proceedings of the
th International Conference on Parallel Processing and Applied Mathematics,
(2011)
Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E. S. , Remón, A.: Hing
performance matrix inversion of SPD matrices on graphics processors. In Workshop
on Exploitation of Hardware Accelerators WEHA 2011, (2011) 640–646
Benner, P., Hinze, M., Ter Maten, J.: Model Reduction for Circuit Simulation. Vol.
of Lecture Notes in Electrical Engineering, Springer-Verlag, Berlin/Heidelberg,
Germany, (2011)
Benner, P., Li, J.-R., Penzl, T.: Numerical solution of large Lyapunov equations,
Riccati equations, and linear-quadratic control problems. Numer. Linear Algebra
Appl., 15 (2008) 755–777
Benner, P., Mayo, R., Quintana-Ortí E. S., Quintana-Ortí, G.: Enhanced services for
remote model reduction of large-scale dense linear systems. In PARA,J. Fagerholm,
J. Haataja, J. Järvinen, M. Lyly, P. Raback , and V. Savolainen, eds., vol. 2367 of
Lecture Notes in Computer Science, Springer, (2002) 329–338
Benner, P., Mehrmann, V., Sima, V., Huffel, S. V., Varga, A.: SLICOT -a subroutine
library in systems and control theory. Applied and Computational Control, Signals,
and Circuits, Birkhuser, (1997) 499–539
Benner, P., Mehrmann, V., Sorensen, D.: Dimension Reduction of Large-Scale
Systems. Vol. 45 of Lecture Notes in Computational Science and Engineering.
Springer-Verlag, Berlin/Heidelberg, Germany, (2005)
Benner, P., Mena, H.: BDF methods for large-scale differential Riccati equations. In
Proc. of Mathematical Theory of Network and Systems, MTNS 2004, B. D. Moor,
B. Motmans, J. Willems, P. V. Dooren, and V. Blondel, eds., (2004)
Benner, P., Quintana-Ortí E. S., Quintana-Ortí, G.: A portable subroutine library
for solving linear control problems on distributed memory computers. In Workshop
on Wide Area Networks and High Performance Computing, London, UK, SpringerVerlag,
(1999) 61–87
Bischof, C.H., Quintana-Ortí, G.: Computing rank-revealing QR factorizations
of dense matrices. ACM Transactions on Mathematical Software, 24(2) (1998)
–253.
Blackford, L. S., Choi, J., Cleary, A., Petitet, A., Whaley, R. C., Demmel, J., Dhillon,
I., Stanley, K., Dongarra,J., Hammarling, S., Henry, G., Walker, D.: ScaLAPACK: a
portable linear algebra library for distributed memory computers - design issues and
performance. In Proceedings of the 1996 ACM/IEEE conference on Supercomputing
(CDROM), Supercomputing -96, Washington, DCUSA, IEEE Computer Society
(1996)
Blanquer, I., Guerrero,D., Hernandez,V., Quintana-Ortí, E. S., Ruiz, P. A.: ParallelSLICOT
implementation and documentation standards. Tech. rep., SLICOT Working
Note (1998)
Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the GPU:
conjugate gradients and multigrid. ACM Trans. Graph., 22 (2003) 917–924
Buatois, L., Caumon, G., Levy, B.: Concurrent number cruncher: An efficient sparse
linear solver on the GPU. In High Performance Computation Conference (HPCC),
Springer Lecture Notes in Computer Sciences, (2007). Award: Second best student
paper.
Chien, L. S.: Hand Tuned SGEMM on GT200 GPU. Tech. rep., Department of
Mathematics, Tsing Hua University, Taiwan, Feb. (2010)
Choi, J., Dongarra, J., Walker,D.: PB-BLAS: A set of parallel block basic linear
algebra subprograms. In Proc. of the 1994 Scalable High Performance Computing
Conference, IEEE Computer Society Press, (1994)
Christen, M., Schenk, O., Burkhart, H.: General-purpose sparse matrix building
blocks using the NVIDIA CUDA technology platform. Tech. rep., (2007)
Cong, J., Shinnerl, J. R., Xie,M., Kong,T., Yuan, X.: Large-scale circuit placement.
ACM Trans. Des. Autom. Electron. Syst., 10 (2005) 389–430.
Demmel, J., Dongarra, J., Croz, J. D., Greenbaum, A., Hammarling,S., Sorensen,D.:
Prospectus for the development of a linear algebra library for high-performance
computers. Tech. Rep. ANL/MCS-TM-97, 9700 South Cass Avenue, Argonne, IL
-4801, USA, (1987)
Eppler, K., Tröltzsch, F.: Discrete and continuous optimal control strategies in the
selective cooling of steel profiles., Z. Angew. Math. Mech., 81 (2001) 247–248
Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Effcient model order reduction of
large-scale systems on multi-core platforms. In ICCSA (5), B. Murgante, O. Gervasi,
A. Iglesias, D. Taniar, and B. O. Apduhan, eds., vol. 6786 of Lecture Notes in
Computer Science, Springer, (2011) 643–653
Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: High performance matrix inversion
on a multi-core platform with several GPUs. IEEE Computer Society, (2011) 87–93
Ezzatti, P., Quintana-Ortí, E. S., Remón, A.: Using graphics processors to accelerate
the computation of the matrix inverse. The Journal of Supercomputing, online
(2011).
Fatica, M.: Accelerating LINPACK with CUDA on heterogenous clusters. In GPGPU,
(2009) 46–51
Gaikwad, A., Toke, I. M.: Gpu based sparse grid technique for solving multidimensional
options pricing pdes. In Proceedings of the 2nd Workshop on High
Performance Computational Finance, WHPCF -09, New York, NY, USA, ACM,
(2009) 6:1–6:9
Galiano V., Martín A., Migallón, H. Migallón, V. Penadés, J., Quintana-Ortí, E.S.:
PyPLiC: A high-level interface to the parallel model reduction library PLiCMR.
In Proceedings of the Eleventh International Conference on Civil, Structural and
Environmental Engineering Computing, B. H. V. Topping, ed., Stirlingshire, United
Kingdom, (2007), Civil-Comp Press. paper 62.
Galoppo, N., Govindaraju,N. K., Henson, M., Manocha,D.: LU-GPU: Effcient algorithms
for solving dense linear systems on graphics hardware. In SC 05: Proceedings
of the 2005 ACM/IEEE conference on Supercomputing, Washing- ton, DC, USA,
IEEE Computer Society, (2005) 3
Göddeke, D., Strzodka, R.A.: Cyclic reduction tridiagonal solvers on GPUs applied
to mixed precision multigrid. IEEE Transactions on Parallel and Distributed
Systems, doi: 10.1109/TPDS.2010.61, 22 (2011) 22–32
Goodnight, N., Woolley, C., Lewin, G., Luebke, D., Humphreys, G.: A multigrid
solver for boundary value problems using programmable graphics hardware. In HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference
on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, Eurographics
Association, (2003) 102–111
Gugercin, S., Sorensen, D., Antoulas, A.: A modified low-rank Smith method for
large-scale Lyapunov equations. Numer. Algorithms, 32(1) (2003) 27–55
Hall, J., Carr, N., Hart, J.: Cache and bandwidth aware matrix multiplication on
the GPU. Tech. rep., UIUCDCS-R-20032328, University of Illinois, (2003)
Higham,N.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia,
USA, (2008)
Hillesland, K. E., Molinov, S. Grzeszczuk, R.: Nonlinear optimization framework for
image-based modeling on programmable graphics hardware. In ACM SIGGRAPH
Courses, SIGGRAPH ’05, New York, NY, USA, ACM, (2005)
Ino, F., Matsui, M., Goda, K., Hagihara, K.: Performance study of LU decomposition
on the programmable GPU. In HiPC, (2005) 83–94
Iordache, M., Dumitriu, L.: Efficient decomposition techniques for symbolic analysis
of large-scale analog circuits by state variable method. Analog Integr. Circuits
Signal Process., 40 (2004) 235–253
Jung, J. H., O’leary. D.: Exploiting structure of symmetric or triangular matrices on
a GPU. In First Workshop on General Purpose Processing on Graphics Processing
Units, Northeastern Univ., Boston, (2007)
Jung, J. H., O’leary. D.: Implementing an interior point method for linear programs
on a CPU-GPU system. Electronic Transactions on Numerical Analysis
Kamon, M., Tsuk, M., White, J.: Fasthenry: A multipole-accelerated 3-d inductance
extraction program. IEEE Transactions on Microwave Theory and Techniques, 42
(1994) 1750–1758
Kamon, M., Wang, F., White, J.: Generating nearly optimal compact models from
krylov-subspace based reduced order models. IEEE Transactions On Circuits and
Systems-II: Analog and Digital Signal Processing, 47 (2000) 239–248
Kolmogorov, A., Fomin, S. V.: Elements of the Theory of Functions and Functional
Analysis. Dover Publications, (1999)
Krüger, J., Schiwietz, T., Kipfer, P., Westermann, R.: Numerical simulations on
PC graphics hardware. In ParSim 2004 (Special Session of EuroPVM/MPI 2004,
Budapest, Hungary, (2004) 442–450
Krüger, J., Westermann, R.: Linear algebra operators for GPU implementation of
numerical algorithms. ACM Transactions on Graphics, 22 (2003) 908–916
Larsen, E. S., McAllister, D.: Fast matrix multiplies using graphics hardware. In
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM),
Supercomputing ’01, New York, NY, USA, (2001), ACM, 55–55
Lasiecka, I., Triggiani, R.: Control Theory for Partial Differential Equations: Continuous
and Approximation Theories I: Abstract Parabolic Systems. Cambridge
University Press, Cambridge, UK, (2000)
Li, J.-R., Kamon, M.: PEEC model of a spiral inductor generated by Fasthenry, in
Dimension Reduction of Large-Scale Systems. P. Benner, V. Mehrmann, and D.
Sorensen, eds., vol. 45 of Lecture Notes in Computational Science and Engineering,
Springer-Verlag, Berlin/Heidelberg, Germany, (2005) 373–377
Li, J.-R., White,J.: Reduction of large circuit models via low rank approximate
gramians. International Journal of Applied Mathematics and Computer Science, 11
(2001) 101–121
Ltaief, H., Tomov, S., Nath, R., Du, P., Dongarra, J.: A scalable high performant
Cholesky factorization for multicore with GPU accelerators. In VECPAR, vol. 6449
of Lecture Notes in Computer Science, Springer, (2010) 93–101
Lucas, R. F., Wagenbreth, G., Davis, D. M., Grimes, R.: Multifrontal computations
on GPUs and their multi-core hosts. In Proceedings of the 9th international
conference on High performance computing for computational science, VECPAR’10,
Berlin, Heidelberg, Springer-Verlag, (2011) 71–82
Maciol, P., Banas K.: Testing tesla architecture for scientific computing: the performance
of matrix-vector product. vol. 3, (2008)
Mena, H.: Numerical Solution of Differential Riccati Equations Arising in Optimal
Control Problems for Parabolic Partial Differential Equations. PhD thesis, Escuela
Politécnica Nacional, Quito, Ecuador, (2007)
Moravanszky., A., Ag., N.: Dense matrix algebra on the GPU. In Direct3D ShaderX2,
Engel W. F., (Ed.). Wordware Publishing, NovodeX AG, (2003) 2
Nath, R., Tomov, S., Dongarra, J.: BLAS for GPUs. In Scientific Computing with
Multicore and Accelerators, J. Kurzak, D. A. Bader, and J. a. Dongarra, eds., CRC
Press, Dec. (2010)
Penzl, T.: Lyapack Users Guide. Tech. Rep. SFB393/00-33, Sonderforschungsbereich
Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107
Chemnitz, Germany, (2000). Available from http://www.tu-chemnitz.de/sfb393/
sfb00pr.html.
Penzl, T.: Algorithms for model reduction of large dynamical systems. Linear
Algebra Applications, 415 (2006) 322–343. (Reprint of Technical Report SFB393/9940,
TU Chemnitz, (1999)
Nath, S. T. R., Dongarra, J.: An Improved MAGMA GEMM for Fermi Graphics
Processing Units. International Journal in High Performance Computing and
Architectures, 24 (2010) 511–515
Remón, A., Quintana-Ortí, E., Quintana-Ortí, G.: Parallel solution of band linear
systems in model reduction. In Parallel Processing and Applied Mathematics, R.
Wyrzykowski, J. Dongarra, K. Karczewski, and J. Wasniewski, eds., vol. 4967 of
Lecture Notes in Computer Science, Springer Berlin / Heidelberg, (2008) 678–687
Riaza R.: Differential-Algebraic Systems. Analytical Aspects and Circuit Applications,
World Scientific, (2008)
Ries, F., De Marco, T., Zivieri, M., Guerrieri, R.: Triangular matrix inversion on graphics processing unit. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, New York, NY, USA, ACM, (2009) 9:1–9:10
Saad,Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2nd ed., (2003)
Schenk, O. Gärtner, K.: Sparse factorization with two level scheduling in pardiso. In PPSC, (2001)
Sengupta, S., Harris, M., Zhang, Y., Owens, J. D. Scan primitives for GPU computing. In GH ’07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, Aire-la-Ville, Switzerland, Switzerland, (2007), Eurographics Association, 97–106
Tomov, S., Nath, R., Ltaief, H., Dongarra, J.: Dense linear algebra solvers for multicore with GPU accelerators. In IPDPS Workshops, IEEE, (2010) 1-8
Varga, A.: Task II.B.1 – selection of software for controller reduction. SLICOT Working Note 1999–18, The Working Group on Software (WGS), http://www. slicot.org/index.php?site=SLmodredR, (1999)
Varga, A.: Model reduction software in the SLICOT library. In Applied and Computational Control, Signals, and Circuits, volume 629 of The Kluwer International Series in Engineering and Computer Science, Kluwer Academic Publishers, (2000) 239–282
Volkov, V, Demmel, J.: Benchmarking GPUs to tune dense linear algebra. In SC ’08:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing, Piscataway,
NJ, USA, (2008), IEEE Press, (2008) 1–11
Wachspress, E.L.: Iterative solution of the Lyapunov matrix equation. Appl. Math.
Letters, 107 (1988) 87–90
Zhang, Y., Cohen, J., Owens, J. D.: Fast tridiagonal solvers on the GPU. In PPOPP,
(2010) 127–136
Los autores/as que publiquen en la Revista ELEMENTOS aceptan las siguientes condiciones:
- Los autores/as conservan los derechos de autor y ceden a la revista el derecho de la primera publicación, con el trabajo registrado con Creative Commons: Reconocimiento - No Comercial -Sin Obra Derivada, que permite a terceros utilizar lo publicado siempre que mencionen la autoría del trabajo y a la primera publicación en esta revista.
- Los autores/as pueden realizar otros acuerdos contractuales independientes y adicionales para la distribución no exclusiva de la versión del artículo publicado en esta revista (p. ej., incluirlo en un repositorio institucional o publicarlo en un libro) siempre que indiquen claramente que el trabajo se publicó por primera vez en esta revista.
- Se permite y recomienda a los autores/as a publicar su trabajo en Internet (por ejemplo en páginas institucionales o personales) antes y durante el proceso de revisión y publicación, ya que puede conducir a intercambios productivos y a una mayor y más rápida difusión del trabajo publicado.
Panorama by Institución Universitaria Politécnico Grancolombiano is licensed under a Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Unported License.