SR SYSTEM ADMINISTRATOR - HIGH PERFORMANCE COMPUTING

Location
Durham, NC
Posted
Apr 13, 2017
Institution Type
Four-Year Institution

SR SYSTEM ADMINISTRATOR - HIGH PERFORMANCE COMPUTING
GCB ADMINISTRAION

Occupational Summary

The incumbent will research, design, deploy, and administer GCB's high-performance computing, storage, and network systems infrastructure. These currently include a high-performance computing cluster tailor- designed for the needs of computing with genomics data at scale, Infiniband interconnects between cluster nodes and a high-performance storage system (currently running IBM's GPFS on a E-series NetApp), an OpenStack-powered cloud for self-servicing compute VMs tailored for computational genomics, and a clustered Avere edge filer appliance that unifies and caches storage from different enterprise storage core filers. In collaboration with the Center's IT leadership and other IT staff, the incumbent will also take a leading role in identifying, evaluating, and recommending new hardware, software, and other IT technology solutions to continually improve how the Center's high-performance computing needs are met.

Specific responsibilities and activities include the following.
  • In collaboration with IT leadership, lead IT scale-out technology evaluation, architecture, design, and deployment projects for high-performance computing. Identify and recommend candidate technologies to improve the Center's IT infrastructure and operations.
  • Deploy, monitor, and maintain IT systems for HPC, in particular high-memory, high-CPU HPC cluster nodes, and an OpenStack cloud for on-demand spinning up and hosting of compute VMs.
  • Develop and implement automated at-scale configuration, deployment, and maintenance of operating systems and software installations, including open source and special-purpose software for computational biology and genomics core facilities.
  • Collaborate with software and database developers in the GCB's IT and other research teams to ensure cohesive, secure, and best-practice adhering operations between informatics development efforts and system administration operations.
  • Collaborate with IT leadership and GCB#s scientific application developers to enable reproducible, reusable, and traceable computational workflows for high-performance computing in genomics.

Qualifications:
The high-performance computing infrastructure of the Center serves a diverse set of scientific computing needs, and is a central part of enabling the pioneering genomics research undertaken in the Center's labs.
To be successful in this position, the incumbent will have all or at least most of the following skills and experience:
  • Extensive experience designing and managing Unix/Linux-based HPC clusters (ideally RedHat and/or Ubuntu), as well as with configuring and managing job schedulers, ideally SLURM.
  • Experience setting up and maintaining compute virtualization clouds, ideally using OpenStack.
  • Strong knowledge of high-performance storage and parallel file system attachments, ideally (E- series) NetApp storage systems, GPFS, and Infiniband networking for interconnects.
  • Strong knowledge of and experience with reproducible, automated, and scalable operating system deployment, configuration management, and software installation, for example using Puppet or (ideally) Ansible.
  • Strong knowledge of and experience with task automation through shell and other scripting languages.
  • Track record of orientation to detail and dedication to professional excellence.
  • Ability to work independently and to communicate effectively with diverse groups of people ranging from technical IT staff to academic researchers and students.
In addition to the above, one or more of the following are a plus:
  • Knowledge of containerization technologies for reproducible isolated software environments, ideally Docker.
  • Experience in academic computing environments, and scientific HPC workloads.
  • Experience with DevOps integration between system operations and application development teams.
  • Experience with using object stores, such as AWS S3 or (ideally) OpenStack Swift, for automatic backup of inactive storage objects.
  • Understanding of technologies and best practices for server and information security, and familiarity with HIPAA compliance rules.
  • Knowledge of federated authentication, authorization, and identity technologies such as LDAP, Windows Active Directory, Kerberos, Shibboleth, and OAuth.
  • Knowledge of IT system monitoring solutions, in particular Nagios


Requisition Number
401197807

Location
Durham

Duke Entity
MEDICAL CENTER

Job Code
2426 ANALYST, IT, SR

Job Family Level
D

Exempt/Non-Exempt
Exempt

Full Time / Part Time
FULL TIME

Regular / Temporary
Regular

Shift
First/Day

Minimum Qualifications
Duke University is an Affirmative Action/Equal Opportunity Employer committed to providing employment opportunity without regard to an individual's age, color, disability, genetic information, gender, gender identity, national origin, race, religion, sexual orientation, or veteran status.

Essential Physical Job Functions: Certain jobs at Duke University and Duke University Health System may include essential job functions that require specific physical and/or mental abilities. Additional information and provision for requests for reasonable accommodation will be provided by each hiring department.

Education: Refer to Job Description


Auto req ID

85239BR

Duke University is an Affirmative Action/Equal Opportunity Employer committed to providing employment opportunity without regard to an individual's age, color, disability, genetic information, gender, gender expression, gender identity, national origin, race, religion, sexual orientation, or veteran status.

Essential Physical Job Functions: Certain jobs at Duke University and Duke University Health System may include essential job functions that require specific physical and/or mental abilities. Additional information and provision for requests for reasonable accommodation will be provided by each hiring department.

PI97510300