Surprise event detection of the supercomputer execution queues

Gál, Zoltán, Tajti, Tibor, Terdik, György (2015) Surprise event detection of the supercomputer execution queues Annales Mathematicae et Informaticae. 44. pp. 87-97. ISSN 1787-5021 (Print), 1787-6117 (Online)

[img] pdf

Download (8MB)

Absztrakt (kivonat)

Huge amount of data is generated by and collected from the IoT (Inter- net of Things) physical and virtual devices. These sets of data series reflect in complex form the state of a given system in multidimensional space. Healthiness evaluation of a given system implies state analysis with enhanced methods. Special events can appear during the execution of jobs in a su- percomputer (HPC – High Performance Computer) system. Depending on the HPC architecture hundreds or even thousands of computation nodes are working in parallel. The scheduler of the HPC front-end node manages different queues (parallel, serial, test, etc.) of the job execution. The multitude of data series captured periodically with several tens of thousands of samples creates a set of several dozen variables for each computation node. The healthiness of the whole HPC system is a temporal concept in the term of 2D or 4D multidimensional time-space domains. In this paper we propose a healthiness evaluation method for each execution queue of two different HPC system with 20 TFLOP/s and 5 TFLOP/s computation capacities, respectively. Time independent community structure is determined and controlled based on multiple similarity measures and ANN (Artificial Neural Network) based SOM (Self-Organized Map) algorithm. For each cluster of variables is determined a representing variable, including time specific and global characteristics of the own cluster. The resulting set of representing variables contains less than ten dissimilar time series. Wavelet methods are used for extreme event detection in time of each representing variable. The surprise event detection in time of the HPC execution queues is based on the simultaneity of extreme events’ fingerprints. Keywords: High Performance Computer, Sensors/actuators, IoT, Complex Event Processing, Event St

Mű típusa: Folyóiratcikk
Szerző neveMTMT azonosítóORCID azonosítóKözreműködés
Megjegyzés: Selected papers of the 9th International Conference on Applied Informatics
Kapcsolódó URL-ek:
Nyelv: angol
Kötetszám: 44.
ISSN: 1787-5021 (Print), 1787-6117 (Online)
Felhasználó: Tibor Gál
Dátum: 27 Feb 2019 18:30
Utolsó módosítás: 27 Feb 2019 18:30
Műveletek (bejelentkezés szükséges)
Tétel nézet Tétel nézet