The Salk Cloud Initiative represents a new matrix organization within the Institute with a mission to develop new capabilities and resources to facilitate interaction with large and complex data sets across multiple disciplines. This position will design, manage and support the Institute’s framework for cloud data and software engineers to develop robust resources for the management and analysis of data in support of both local and extramural research activities. The cloud initiative serves as a nexus within the Institute for data-centered partnerships with technology companies and academic organizations.
The Institute’s IT department is looking for a Cloud Infrastructure Architect to join our team. Your work will assist hundreds of investigators and core facilities to achieve their research goals using new platforms, technologies, and approaches to high-performance and big-data computing.
Reporting to the Sr. Director of IT, you will join a small matrix-managed team that designs, deploys and supports research focused, IT cloud platforms for use by the Salk community including bioinformatics specialists and big data researchers on a mission to empower our researchers with innovative data science tools and infrastructure capabilities to accelerate their research.
1) Cloud Architecture Design and Management (20%)
- Architect a cloud-based platform, including network design, security, redundancy, account management and role-based permissions framework for use by cloud research computing specialists to containerize existing workflows and build cloud-native solutions to emergent research needs.
- Work with cloud research teams to ensure that we get maximum computational and storage value by intelligently using cloud options such as AWS spot market and preemptable VMs in GCP, and Amazon Deep Glacier archival storage.
- Facilitate the benefits of modern Object Storage systems and migrate workflows away from Posix file system dependencies to cloud-native using docker based workflows and object storage in concert with cloud research computing specialists
- Perform Site Reliability Engineering tasks using modern tools such as Prometheus and proactive log analysis.
- Develop and maintain tools to track cloud usage patterns, including tools for alerting on and/or interrupting anomalous workloads.
2) Cloud Research and Infrastructure Project Support (25%)
- Assist researchers with building and improving cloud based computational pipelines using multiple tools.
- Support administrative cloud computing efforts including IaaS and PaaS projects from the IT/IS team and other admin units.
- Recommend and evangelize new cloud-based platform services that increase computational productivity or enable new capabilities.
- Contribute documentation using Github or Confluence to a consolidated knowledgebase that covers cloud-based research computing and best practices for operational configuration of cloud resources
3) Cloud Infrastructure Technical Support (20%)
- Provide technical support and troubleshooting to resolve cloud-based computational and storage technical issues.
- Troubleshoot end-to-end data transfer and performance issues (including streaming of on-prem data sources into cloud-based analysis pipelines).
- Develop and maintain cloud-based and on-prem automation to streamline infrastructure management, billing and usage reporting.
4) Cloud Technical Consulting (20%)
- Provide architectural input to a cross-functional cloud architecture team and build relationships with other IT professionals at Salk and educate them about cloud based scientific computing.
- Review Cloud architectures proposed by other teams for consistency with Institute guidelines on infrastructure design and security.
5) Cloud Account and Billing Management (10%)
- Manage vendor relationships and cloud contractual agreements and billing tools with cloud providers; provide best value and efficiency in proposals
6) Other Duties As Assigned (5%)
- A minimum of 10 years administering and working with Linux systems of various flavors (Centos/Redhat preferred)
- Deep knowledge of AWS and Google cloud services and their APIs, including popular storage, networking, security and automated provisioning tools
- Experience with various forms of infrastructure and application virtualization such as VMware, Docker, Singularity, Amazon Lambda (serverless computing)
- 5 years working with Python or other modern scripting language
- Familiarity with infrastructure as code concepts and infrastructure automation tools (terraform, cloud formation)
- Experience troubleshooting performance issues with Linux based computing environments, storage platforms and cloud-based networking infrastructure
- Knowledge of application, network and infrastructure security best practices
- Knowledge of tools and platforms to provide cloud application and security monitoring
- Demonstrated experience with cloud-native technologies such as MapReduce, Cloudformation, Container as a Service offerings, and various storage platforms and automated data tiering capabilities
- Experience with high-speed networking and data migration to/from cloud infrastructure with high throughput
- Experience using configuration management and/or automation systems such as Chef, Puppet, Ansible, cfengine, or Salt
- Experience with GPU-based systems
- Experience in writing web based tools used to manage operations. (e.g. Flask)
- Knowledge of R, Matlab, and Galaxy experience is a plus
- Experience working in matrix-based or cross-functional/multidisciplinary teams
- Experience in a research or educational environment
- Bachelor’s degree in computer science, IT, or related scientific discipline, or an equivalent amount of demonstrated experience in lieu of a degree
- Advanced degree in computer science, IT, or related scientific discipline
SKILLS AND ABILITIES
- Strong communications skills with ability to communicate effectively to a broad variety of constituencies including executive administration, department heads, principal investigators and senior researchers
- Ability to produce high quality written materials including training and infrastructure documentation (e.g., maintain a wiki)
- Ability to work independently or as part of a project team to analyze requirements and determine appropriate design approaches
- Work with department leadership on establishing and managing priorities
- Assist in developing budgets and track expenses for cloud-based infrastructure projects
- Ability to manage multiple projects with varying priorities at the same time
- Ability to quickly adapt to new technologies and learn independently
- Ability to work with the software development and scientific computing/research communities
- Ability to train peers on cloud technologies and infrastructure management
SPECIAL CONDITIONS OF EMPLOYMENT
- Must be willing to work in an animal-related research environment.
- Satisfactory completion of the Institute’s background investigation.
- Willing to sign a confidentiality agreement
PHYSICAL REQUIREMENTS/MENTAL ACTIVITIES/ENVIRONMENTAL CONDITIONS
Constantly Adjusting focus, Grasping, Hearing, Keying, Seeing, Sitting, Talking, Analyzing, Calculating, Communicating, Reading, Reasoning, Writing and Working Inside.