Detection and correction of silent errors in the Conjugate Gradient algorithm | Vidéo | Carmin.tv

00:00:00 / 00:00:00

Detection and correction of silent errors in the Conjugate Gradient algorithm

By Gérard Meurant

Appears in collection : Numerical Methods and Scientific Computing / Méthodes numériques et calcul scientifique

There are more and more computing elements in modern supercomputers. This increases the probability of computer errors. Errors that do not stop the computation are called soft errors or silent errors. Of course, they could have a negative impact on the output of the code. So, it is of interest to be able to detect these silent errors and to correct them. In this talk we are concerned with the detection and correction of silent errors in the conjugate gradient (CG) algorithm to solve linear systems Ax = b with a symmetric positive definite matrix A. Silent errors in CG may affect or even prevent the convergence of the algorithm. We propose a new way to detect silent errors using a scalar relation that must be satisfied by CG variables, $\alpha_{2 k-1}\tfrac{\left(A p_{k-1}, A p_{k-1}\right)}{\left(r_{k-1}, r_{k-1}\right)}=1+\beta_{k},(1)$ where rj's are the residual vectors, pj's the descent directions and $\alpha_{k-1}=\tfrac{\left(r_{k-1}, r_{k-1}\right)}{\left(\mathrm{p}_{\mathrm{k}-1}, \mathrm{Ap}_{\mathrm{k}-1}\right)}$, $\beta_{\mathrm{k}}=\frac{\left(\mathrm{r}_{\mathrm{k}}, \mathrm{r}_{\mathrm{k}}\right)}{\left(r_{k-1}, r_{k-1}\right)}$ are the coefficients computed in $\mathrm{CG}$. We study how relation (1) is modified in finite precision arithmetic and define a criterion to detect when this relation is not satisfied. Checking relation (1) involves computing an additional dot product, but, as it was shown some time ago in [1] and more recently in [2], relation (1) can be used to introduce more parallelism in the algorithm. Assuming that the input data $(A, b)$ is not corrupted, we model silent errors by bit flips in the output of some CG steps. When an error is detected in some iteration $\mathrm{k}$, we could restore the CG data from iteration $k-2$ to be able to continue the computation safely. Numerical experiments will show the efficiency of this approach.

Information about the video

Date of recording 09/11/2021
Date of publication 26/11/2021
Institution CIRM
Licence CC BY NC ND
Language English
Audience Researchers
Director(s) Luca Recanzone
Format MP4

Citation data

DOI 10.24350/CIRM.V.19829403
Cite this video Meurant, Gérard (09/11/2021). Detection and correction of silent errors in the Conjugate Gradient algorithm. CIRM. Audiovisual resource. DOI: 10.24350/CIRM.V.19829403
URL https://dx.doi.org/10.24350/CIRM.V.19829403

Domain(s)

Numerical Analysis

Bibliography

[1] MEURANT, Gerard. Multitasking the conjugate gradient method on the CRAY X-MP/48. Parallel Computing, 1987, vol. 5, no 3, p. 267-280. - https://doi.org/10.1016/0167-8191(87)90037-8
[2] CHEN, Tyler et CARSON, Erin. Predict-and-recompute conjugate gradient variants. SIAM Journal on Scientific Computing, 2020, vol. 42, no 5, p. A3084-A3108. - https://doi.org/10.1137/19M1276856

MSC codes

Last related questions on MathOverflow

You have to connect your Carmin.tv account with mathoverflow to add question

Ask a question on MathOverflow

Copyright Carmin.tv 2025

Give feedback