Software Dvlpr 2
- Employer
- Stanford University
- Location
- School of Medicine, Stanford, California, United States
View more categoriesView less categories
- Employment Type
- Full Time
- Institution Type
- Four-Year Institution
Job Details
The Department of Veterans Affairs (VA) has commissioned the sequencing of thousands of whole genomes from participants in the Million Veteran Program (MVP) [https://www.mvp.va.gov/]. This data is currently being delivered to the SCGPM’s cloud computing environment and constitutes one of the largest repositories of whole-genome sequencing data in the world. The scale and richness of this data make it an incredible resource for biomedical research. Our goal is to turn this data lake into a data commons: a dynamic computing environment where researchers bring questions and get answers, all without having to go through the ordeal of manually collecting, cleaning, massaging, scrubbing, sorting, transforming, and filtering data.Position:
In this position, you would be the lead architect and system implementer of the cloud-based MVP data management system that we have created called Trellis. Trellis keeps track of the petabytes of sequence data contributed to the MVP by veterans. It also orchestrates the processing of that data into derivative files, while keeping track of what programs were used to transform the data, maintaining a detailed record of data provenance.To manage the enormous volumes of biomedical research data that the MVP generates, we built and run Trellis in the Google Cloud Platform. The Trellis architecture takes advantage of many serverless cloud services such as Cloud Functions and Pub/Sub to make a workflow which responds to the arrival of new data by initiating pipeline processes automatically.A production version of Trellis has already processed the whole genomic sequences of 150,000 veterans and we plan to process at least as many more in the coming year. You would be in charge of keeping this production system running and optimized, and you would interface with the DevOps team which will maintain that system in a FedRAMP-secure environment.Now that we have proven that we can process and manage biomedical data at scale, our desire is to make MVP data more easily accessible to VA-internal researchers and to the scientific community at large. Possible directions for this sharing include creating a visualization front-end to allow researchers to experiment with data graphically and providing a cohort selection mechanism so subpopulations of veterans can be studied. You would continue the development of the Trellis system to integrate new data from the VA and to present Trellis data to the research community with tools and interfaces which are easy-to-use and powerful. To help you achieve these goals, you would direct a small team of excellent, self-starting engineers in tasks like devising new pipelines for quality control and integrating demographic data with sequence data.This project has many open-source components, and you would be encouraged to publish details from your systems architecture work or results from processing the genomic data. As an example of a publication from this group, see this reference describing the early design of the Trellis system:Ross, P.B., Song, J., Tsao, P.S. et al. Trellis for efficient data and task management in the VA Million Veteran Program. Scientific Reports 11, 23229 (2021). https://doi.org/10.1038/s41598-021-02569-5Our Team:
Our SCGPM bioinformatics team is a multi-disciplinary group composed of about a dozen scientists, engineers, and software developers with complementary backgrounds, each contributing their own expertise in managing and analyzing complex biomedical data [http://med.stanford.edu/gbsc/scgpm-team.html]. Projects supported by this team include the Stanford Genomics Sequencing Center, the VA Million Veteran Project, the NCI Human Tumor Atlas Network, Human BioMolecular Atlas Program, and the Stanford Metabolic Health Center.This position can be on-site, fully remote, or hybrid.Bioinformatics System Architect Duties include:
- Collaborating with researchers to design solutions to relevant biological questions and maximize the value of our whole-genome sequencing dataset to the publicDetermining how to implement big data technical solutions to those questions in a cloud environmentDockerizing bioinformatics tools and integrating them with our internal data management system to automate workflowsImplementing population-level genomic analyses (GWAS, PCA) using big data technologiesConnecting research data to biological knowledge to streamline the process of answering biological questionsTransforming genomic data from custom file formats into database-native formatsDeveloping an ontology to describe the relationships between data objects and resources involved in research data management
- Proposing, conceptualizing, designing, implementing, and developing solutions for difficult and complex applications independentlyOverseeing systems testing, debugging, change control, and documentationSupervising professional staff, as necessary, working on all phases of application development projectsEngaging in long-term strategic planningDefining complex application development administration and programming standardsOverseeing the support, maintenance, operation, and upgrades of applicationsTroubleshooting and resolving complex technical problemsWorking with other technical professionals to develop globally-applicable standards and implement best practices
- Four-year degree in Genetics, Computer Science, Bioinformatics, Computational Physics, or a related fieldExperience modelling biological/biomedical data and metadataExperience with biological -omics data formats (FASTQ, FASTA, BAM, Proteomics, Metabolomics, et al.)Comfortable in programming with PythonAn ability to independently grasp the objectives of research projects and assemble solutions from a range of technologies, standards, and approachesA desire to learn new methods and technologies and to adapt to demands of fast-paced researchExcellent verbal and written communication skillsExperience managing small teamsExperience managing projectsExperience with cloud computing, especially Google CloudExperience with databases, especially graph databasesExperience with big data technologies (e.g., BigQuery, Spark)Familiarity with issues in computer data securityFamiliarity with FAIR principles of data management
- Bachelor's degree and five years of relevant experience, or a combination of education and relevant experience.
- Expertise in designing, developing, testing, and deploying applications.Proficiency with application design and data modeling. Ability to define and solve logical problems for highly technical applications. Strong communication skills with both technical and non-technical clients.Ability to lead activities on structured team development projects.Ability to select, adapt, and effectively use a variety of programming methods. Knowledge of application domain.
- Constantly perform desk-based computer tasks. Frequently sit, grasp lightly/fine manipulation.Occasionally stand/walk, writing by hand. Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.
- May work extended hours, evening and weekends.
- Schedule: Full-time
- Job Code: 4822
- Employee Status: Regular
- Grade: J
- Requisition ID: 105251
- Work Arrangement : Hybrid Eligible
Organization
Change the world. And yourself.
Stanford University has changed the world, over and over again.
We are one of Silicon Valley's largest employers - and also one of the most unique. Our mission is to educate future leaders and promote interdisciplinary, world-class research and teaching. This passion makes Stanford an intensely creative, rewarding, and challenging place to work. At the same time, our traditions of respect and collaboration sustain a humane, supportive environment in which to pursue your life and your career.
At Stanford you'll work with bright, diverse, dedicated people. You'll find encouragement to learn and grow. You'll enjoy excellent benefits and an outstanding environment.
Stanford Facts at a Glance
Opened 1891
Student Enrollment
- Undergraduates: 6,980
- Graduates: 8,897
Campus
- 8,180 contiguous acres in six governmental jurisdictions
- Nearly 700 major buildings
- 97% of undergraduates live on campus
Research
- 5,300 externally sponsored projects
- $1.33 billion total budget
Faculty
- 2,043 faculty members
- 21 Nobel laureates are currently members of the Stanford community
- 5:1 student to faculty ratio
Stanford University is an Equal Employment Opportunity and Affirmative Action Employer and is committed to recruiting and hiring qualified women, minorities, protected veterans and persons with disabilities.
Get job alerts
Create a job alert and receive personalized job recommendations straight to your inbox.
Create alert