Temporary loss of the Hydra storage on Wednesday October 18th

[id : 302] [29/05/2018] [hits : 29316]

An maintenance operation on Wednesday October 18th on the Hydra storage lead to an access loss from the compute nodes. The storage was again online at 16:46. We expect that some running jobs failed during the storage access interruption. Please check your jobs that completed between 16:20 and 16:50 for errors in the output or error files. Sorry for the inconvenience.

Planned HYDRA maintenance on September 25 completed successfully

The maintenance work on Hydra has completed.

On your side, you don't need to change anything, the new nodes will be automatically used for the newly started jobs. Access to the Hydra cluster has been reopened.

We have planned a downtime for HYDRA on September 25 that should last only a few hours.

We have created a global reservation on HYDRA to make sure that no job will be running at that moment. Jobs that can complete before the date will be executed and those with a walltime beyond September 25 will be maintained in the queue.

A cluster downtime is necessary to put in production a new network of 10 Gbps to access HYDRA storage (the work directory) which must be enabled in the GPFS storage system.

The GPFS Storage system will be used by 27 additional HYDRA compute nodes and one high memory node of 1.5 TB RAM which are ready to go in production. Hydra features now 2300+ cores.

Technical details of the new HYDRA nodes:

New compute nodes details:
Each node will feature 2x Intel Xeon E5-2680 v4 @ 2.40GHz (14 Cores), 256 GB RAM, 10 Gbps for storage access and 900 GB local storage.
A total of 756 cores and ~7 TB RAM will be added to HYDRA.

New high-memory node details:
The node will feature 4x Intel Xeon E7-8891 v4 @ 2.80GHz (10 cores), 1536 GB RAM, 10 Gbps for storage access and 3.7 TB local storage.


Raphael Leplae - hpc@vub.ac.be


: :: ::: ::::