HPC Systems Administrator
University of Colorado Denver l Anschutz Medical Campus
Office of Information Technology
HPC Systems Administrator
Position #734739 - Requisition #09906
* Applications are accepted electronically ONLY at www.cu.edu/cu-careers *
Office of Information Technology has an opening for a full-time University Staff (unclassified) HPC Systems Administrator position.
The Office of Information Technology works to advance the University mission by providing innovative technology solutions and services to the CU Denver Anschutz Medical Campuses, their constituents and partners.
Through our six core values, Service, Professionalism, Leadership, Innovation, Community, and Excellence (SPLICE), we make a difference.
Click here to find out more about the Office of Information Technology
This position will be responsible for the on-going development and operational management of a central HPC cluster and service. This environment includes all the necessary HPC components, including but not limited to a scheduler, manager nodes, master nodes, login nodes, compute nodes, storage and cluster management software. This position will function as a team lead and subject matter expert. Excellent verbal and written communication skills, as well as strong customer service abilities are necessary to continue to mature the service and support end users. This position will serve as a critical resource to the Director of Research and Shared Services.
The HPC Systems Administrator will be responsible for the on-going development and operational management of a central HPC cluster and service. They will function as a team lead and interact closely with both the system owner and HPC users ensuring stable operations and services.
Examples of work performed:
- Responsible for administration of HPC cluster notes, operating systems and applications in a security focused, HIPAA-compliant environment.
- Responsible for ongoing security program, including security improvements and scheduled audits.
- Responsible for managing the change control process in the HPC environment.
- Responsible for maintenance of billing infrastructure for HPC systems, including monthly billing runs.
- Assist with the management and maintenance of the storage array utilized by the HPC cluster.
- Collaborate with the system owner to enhance and grow the central HPC service.
- Develop and implement administrative and operational policies and procedures and evaluate their efficiency and effectiveness.
- Perform software installations and upgrades to operating systems and layered software packages, schedule installations and upgrades, and maintain them in accordance with established OIT policies and procedures.
- Provide monitoring and prompt response to support request, technical questions and problems encountered by the users.
- Configuration of the scheduling and queuing of the system requests.
- Responsible for coordinating with vendors to resolve hardware and software issues.
- Responsible for documentation of the HPC environment as well as documenting system administration policies and procedures.
- Assist with the development, implementation, testing and maintenance of the system backup and recovery plans and policies.
- Periodic labor outside of business hours will be required though telecommuting is typically feasible.
Salary and Benefits:
The salary range for this position has been established at $90,000 to $105,000 and is commensurate with skills and experience.
The University of Colorado offers a full benefits package. Information on University benefits programs, including eligibility, is located at https://www.cu.edu/employee-services.
- Bachelor's degree in Computer Science, Computer Information Systems or a closely related field.
- Minimum of 2 years of professional experience working as a server administrator and performing the installation, configuration and support of a Linux environment.
Substitution: A combination of related work experience in the areas listed above may be substituted on a year-for-year basis for the bachelor's degree. The other minimum requirements listed below can overlap with this experience.
Conditions of Employment:
- Must be willing and able to travel between campuses (Denver Campus and Anschutz Medical Campus)
PLEASE NOTE: Candidates will be responsible for travel expenses related to the interview process and any relocation expenses, if applicable.
- Team lead and/or management experience.
- Experience serving as an HPC cluster server administrator.
- Experience installing and configuring job schedulers (e.g. Slurm, etc.).
- Experience with InfiniBand and IBM Spectrum Scale (GPFS)
- Experience with automated cluster management tools such as Bright Cluster Manager.
- Experience with parallel computing environments (e.g. OpenMPI, MVAPICH, MapReduce, etc.).
- Experience with a version control system (e.g. git)
- Experience with Bash scripting or Python scripting.
- Experience with Cloud computing and Server virtualization.
Knowledge, Skills, and Abilities:
- Advanced knowledge of RHEL 7 and experience supporting and troubleshooting the OS in a complex and security-conscious production environment.
- Compiling, installing, and/or developing software in a Linux environment.
- Knowledge of the technologies underlying HPC cluster systems.
- Experience with configuration management systems such as Puppet.
- Working knowledge of at least one programming language.
- Experience configuring and maintaining cluster management software.
- Experience with configuring and managing parallel file systems.
- Experience with server monitoring and alerting and ability to quickly and efficiently respond to issues.
- Strong analytical, conceptual, and problem-solving abilities.
- Strong planning, design, analytic, research and documentation skills.
- Familiar with common network concepts, protocols and tools.
- Strong technical aptitude and ability to research & solve complex issues independently.
- Possess proven ability to multi-task.
- Strong interpersonal communication and customer service skills working with non-technical users and stakeholders.
- Must be able to work effectively as part of a project team and foster team cooperation.
- Ability to work with a high degree of independence and latitude.
- Ability to support multiple projects simultaneously while effectively managing time and prioritizing tasks.
Special Instructions to Applicants:Application Deadline: Applications are accepted electronically ONLY at www.cu.edu/cu-careers. APPLICATION DEADLINE: Review of applications will begin immediately and will continue until the position is filled.
The University of Colorado Denver Anschutz Medical Campus is dedicated to ensuring a safe and secure environment for our faculty, staff, students and visitors. To assist in achieving that goal, we conduct background investigations for all prospective employees. The University of Colorado Denver Anschutz Medical Campus is committed to recruiting and supporting a diverse student body, faculty and administrative staff. The university strives to promote a culture of inclusiveness, respect, communication and understanding. We encourage applications from women, ethnic minorities, persons with disabilities and all veterans. The University of Colorado is committed to diversity and equality in education and employment.
Application Materials Required:Cover Letter, Resume/CV, List of References
Application Materials Instructions:To apply, please visit: http://www.cu.edu/cu-careers and attach: 1. A letter of application which specifically addresses the job requirements and outlines qualifications 2. A current CV/resume 3. List of three to five professional references (we will notify you prior to contacting both on and off-list references)
Job Category: Information Technology
Primary Location: Aurora
Department: U0001 -- Denver-Anschutz Administration - 60131 - ADM-AVCOIT Administration
Posting Date: Jun 13, 2017
Closing Date: Ongoing
Posting Contact Name: OIT Human Resources
Posting Contact Email: ucd-oit.HumanResources@ucdenver.edu
Position Number: 00734739