Intro for previous Fabian users
The new service is primarily accessed using a web browser (Chrome, Firefox and Edge preferred) via https://ondemand.fab.lse.ac.uk/
You no longer need to use a VPN or be on site to use Fabian.
SSH – particularly problematic for Windows users with Putty – is no longer the main access route.
On-site Fabian
The current on-site Fabian is end-of-life and the decision has been made not to replace it like-for-like, and instead to provide the same or improved capabilities in a different manner.
The benefits of this will be:
- You will gain access to newer, faster processors and the future capability to add new processors as they are released without having to repurchase and commission any hardware. The cluster will remain current, rather than ageing like the on-site Fabian.
- We have built an elastic infrastructure, one that can scale with demand, well beyond the capabilities of the current cluster. We are no longer limited by the size and facilities of the current cluster.
- We will no longer waste energy running servers that are not being used.
- The file and disk services will be brought up to date, and be at least an order of magnitude faster.
- We will have in-support, robust and well supported infrastructure – reducing the risk to your data and research and bringing to an end the frequent outages Fabian has experienced. We will no longer spend time supporting the servers and networking that underpin Fabian, freeing us up to do more to support your research.
- We can finally offer backups to keep your data safe.
- The new service has been designed with security at the front of our minds, rather than as an afterthought.
- We will have the ability to offer additional, separated, clusters for increased flexibility for projects, centres and departments.
- The new infrastructure shall have a lower environmental impact than the current hardware and we will be able to communicate the environmental impact back to you and the School at large.
- We will have a test platform, allowing us to add new features and keep the service up to date with less risk to your data and research.
- The platform has the capability to support new service features, for example databases, improving the service offering.
Changes to components of the service
The cluster
The old Fabian was a set of 17 compute nodes bought in 2016/7 that are starting to show their age and are fixed in size. When they were not being used, they sat their idle and when there was more work than the compute nodes could handle, the extra work had to wait in the queue. In contrast, the new Fabian is based in the cloud. This means it is flexible and if there are no jobs running, we run no compute nodes. As jobs are added, new compute nodes are created dynamically to the cluster to support those jobs. Once those jobs are finished, the scheduler waits a short while in case new jobs come along to assign to that compute node and if none come, it removes that compute node. So whilst the old Fabian was limited by the resource we had, the new fab cluster is limited by the amount we want to spend on it. One minor downside of this approach is you will have to wait a few minutes longer for your jobs to start if a new server is needed (as the compute node needs to be started and configured for our use).
Scheduler
The old Fabian used Sun Grid Engine as a scheduler, the new service uses Slurm which has different commands to schedule, view and cancel your jobs.
One benefit of Slurm is that you can specify --mem=4G
to ask for 4G of memory regardless of the number of CPU cores you ask for; previously the memory was per CPU core asked-for, which caught a lot of users out and meant that they requested many times more memory than their job required.
Remote Desktop
The previous client-based LSE Fabian Desktop Access application has been dispensed with as you can now access a remote desktop session directly in your browser.
Software
We have only installed the latest versions of software packages on the new service. We can install the older versions that were available on Fabian, but in order to save time and cost this is done on-request and when you have a valid reason to use an old version. Python users should note that we have chosen not to install Anaconda's Python2 as this is long past end-of-life and presents security challenges.
Storage
The old Fabian had a fixed size storage system that was a single point of failure and outages required a complete restart of the service. The new fab has a flexible storage service provided to us as a service and as we don't have to manage the storage, it is no longer a single point of failure. Storage will also grow automatically; its performance is also faster and it will get faster the more we store on it. As with compute, we are now limited by our budget rather than a fixed resource purchased years before.
GPUs
Fabian had no GPUs available and adding them was prohibitively expensive so we could not offer them. The new fab cluster does offer GPU nodes as these will only be paid for while they are being used and we can offer them on the basis that some jobs will cost less (because they will be quicker) to run with GPUs than they would (taking longer) on just CPUs.