#97295 HPC Systems Engineer

La Jolla, CA
Jun 19, 2019
Administrative Jobs
Technology, Analysts & Programming
Institution Type
Four-Year Institution

Posting will remain open until position is filled.


The incumbent will apply skills as a seasoned, experienced systems integration professional with a full understanding of systems and software integration concepts to evaluate, resolve and implement medium-sized projects or portions of large projects with moderate scope and complexity. Will resolve a wide range of business processes, system functionality, implementation issues and system and software integration issues and will need to demonstrate competency in selecting tools, methods and techniques to obtain results.

The incumbent will give technical presentations to associated team and other technical units, evaluate new technologies including performing moderate to complex cost/benefit analyses, and may lead a team of systems/infrastructure professionals.

The responsibilities include system administration with on-call duties for large HPC Linux and Unix clusters with high-performance interconnects such as InfiniBand and multi-gigabit Ethernet. Implement HPC solutions to address a variety of problems and tasks in scientific areas related to advanced computing or networking. Work with very complex, advanced systems and networks in production, research and performance evaluation environments.

Responsible for system internals, network and operating systems, emerging technologies, hardware, and architectures and the interrelationship of all the foregoing. Support resource managers, schedulers and client access to parallel file systems, including Lustre and GPFS. Run user codes to debug the systems.

Participate in collaborative team-based efforts, such as national projects like XSEDE and its constituent working groups. Work closely with other groups to integrate the HPC systems into the SDSC networking, data mining and user environments. Collaborate on security procedure development and implementation. Work with the User Services and operations group in training their staff and planning system maintenance.

For more information, please visit www.sdsc.edu.

  1. Advanced skills associated with system integration design, modification, implementation and deployment in a moderately complex environment.

  2. Proven experience as an administrator of large-scale HPC clusters.

  3. Advanced knowledge of Linux, including services, networking, and file systems.

  4. Strong experience with a major configuration management software, including application packaging and installation.

  5. Demonstrated experience with complex troubleshooting in a multi-platform environment.

  6. Experience administering distributed and parallel file systems such as Lustre and GPFS or other high performance file systems

  • Job offer is contingent on a clear background check.

  • Occasional evenings and weekends may be required. Overtime and weekends may be required.

Similar jobs

Similar jobs