HPC Systems Administrator
The University of Colorado Boulder is committed to building a culturally diverse community of faculty, staff, and students dedicated to contributing to an inclusive campus environment. We are an Equal Opportunity employer, including veterans and individuals with disabilities. Who We AreIn Research Computing at CU Boulder, we offer state-of-the-art computing and data services "beyond the desktop" including:
- large-scale computing resources
- storage of research data
- high-speed data transfer
- data sharing support
- consulting in computational science and data management
We are an integral part of both the work of our research institutes and the Office of Information Technology (OIT). OIT is a dynamic organization, filled with energetic staff and students who aim to serve the campus and contribute to student success while supporting the University’s academic, research and service missions. We’re located in the heart of the beautiful CU Boulder campus. See what OIT is all about by watching This is OIT.What Your Key Responsibilities Will Be
- Planning, proposing, and implementing - New solutions in and improvements to the RC HPC cluster environment. This may include hardware repairs, operating system provisioning and configuration, system software updates, and procedure automation. Respond to end-user queries.
- Proactive automated and daily monitoring and health checks - Of the RC HPC cluster infrastructure using automated monitoring systems.
- Testing and tuning RC compute and storage systems - To increase performance and reliability.
- System administration of RC storage services - Including hardware maintenance, file system configuration, storage server updates, and access provisioning and control.
- Maintaining and/or creating documentation - In support of the research computing infrastructure for the benefit of the RC user community and members of the internal RC team.
- Providing advice and assistance - Regarding configuration, maintenance, and monitoring of the RC Science Network.
- This position carries a general expectation to respond to critical issues and incidents that arise outside of normal business hours within a reasonable time frame, as established by the position’s supervisor. This expectation is consistent with commitments RC has made with customers that many of its services will have “best effort” coverage outside of regular business hours.
- We have listed only the minimum requirements ("What We Require") for this position. However, we are hiring for different levels of system administration skills, experience and, at the more senior level, the capacity to independently devise and drive creative solutions. Base salary will be competitive and market-driven, and should always be considered in the context of the tremendous benefits of being a member of the professional staff at CU Boulder. See: Benefits at CU Boulder.
The University of Colorado offers excellent benefits, including medical, dental, retirement, paid time off, tuition benefit and ECO Pass. The University of Colorado Boulder is one of the largest employers in Boulder County and offers an inspiring higher education environment. Learn more about the University of Colorado Boulder.Be StatementsBe Creative. Be Imaginative. Be Boulder.What We Require
- Bachelor's Degree in science, engineering or related field. A combination of education and relevant experience as described below may be substituted for a degree on a year for year basis.
- Detailed knowledge of and 2 years professional experience in a combination of the following:
- Design, deployment, configuration, and administration of clustered Linux or Unix computer systems.
- Evaluating, configuring, and maintaining systems software.
- Monitoring and maintaining server hardware.
- Exceptional ability to work effectively both within a team and also independently, as circumstances warrant.
- Demonstrated ability to follow through with assignments and commitments in a timely and professional manner.
- Demonstrated ability to work from a set of requirements to build complex computing systems.
- Demonstrated experience in system and related network administration of complex computer systems, specifically Linux systems and preferably Linux clusters.
- Experience diagnosing and repairing computer hardware.
- Experience with batch queueing systems, preferably Slurm.
- Experience with HPC interconnects, e.g. InfiniBand and/or Omni-Path.
- Familiarity with stateless and/or diskless server provisioning.
- Experience with a parallel or networked file system, e.g. GPFS, Lustre, or NFS.
- Knowledge of or experience in networking systems and software including DNS, LDAP, and TCP/IP.
- Familiarity with configuration management using tools such as puppet, foreman, and git.
- Scripting experience with bash and/or python.
- Familiarity with ticket tracking systems such as ServiceNow.
- A current resume.
- A cover letter that specifically addresses how your background and experience align with the requirements, qualifications and responsibilities of the position.
Note: Application materials will not be accepted via email. For consideration, applications must be submitted through CU Boulder Jobs.
Posting Contact InformationPosting Contact Name: Boulder Campus Human ResourcesPosting Contact Email: [email protected]