Lead Bioinformatic and Computational Scientist - PACC
Columbia University
Application
Details
Posted: 01-Sep-23
Location: New York, New York
Type: Full-time
Salary: Open
Internal Number: 538699
Job Type: Officer of Administration
Bargaining Unit: n/a
Regular/Temporary: Regular
End Date if Temporary: n/a
Hours Per Week: 35
Salary Range: $110,000 - $150,000
The salary of the finalist selected for this role will be set based on a variety of factors, including but not limited to departmental budgets, qualifications, experience, education, licenses, specialty, and training. The above hiring range represents the University's good faith and reasonable estimate of the range of possible compensation at the time of posting.
Position Summary
The Columbia Precision Medicine Initiative at Columbia University Irving Medical Center is seeking a highly motivated Bioinformatic and Computational Scientist to lead the innovation, design, and development of the CPMI production genomics compute infrastructure. The incumbent will apply the latest hardware, software can cloud technologies to innovate solutions in genomics data structuring, compute workflows, and large-scale data manipulation. This role requires interaction with a multi-disciplinary team from LIMS, IT, Clinical, and Compliance operations across CUIMC.
The candidate will work closely with internal and external partners to lead a team in the research, design, and development of systems addressing the data management and analysis of major genetics and genomics projects.
Responsibilities
Be involved in several projects including efforts such as genomic sequence analyses, optimization of analysis approaches, and cloud-based and on-prem computation, and is a point person in developing CUIMC precision medicine pipelines.
Develop and support multiple research projects by delivering bioinformatics services for biomedical data from large datasets.
Lead a variety of projects and gain exposure to many productive areas of research.
Train and guide a team of Bioinformatic and computational scientists.
Support the storage, backup, access, and analysis of large volumes of sequencing data (>100,000 sequenced exomes and genomes).
Work closely with CUIMC central IT to migrate data and analysis to a cloud service provider as well as maintain existing software on-premises.
Develop, implement, and maintain ATAV, the primary discovery tool used for population-scale genomic analyses.
Develop, implement, and maintain a pipeline of analysis of raw genetic sequencing data, perform quality control checks, align sample data to the reference genome, and produce variants called files (VCFs), and joint-genotyped VCF files.
Manage databases, conduct statistical and genomic analysis.
Assist staff and collaborators with data access, analysis, and interpretation.
Develop methods of visualizing, exploring, and mining genomics data.
Maintain and oversee software development, code documentation, bug tracking, feature development, and audit trails.
Integrate software designs into larger ecosystems and develop best practices for implementation and test procedures in collaboration with IT.
Assist IT with troubleshooting tasks, as well as the design and process of SOPs where needed.
Collaborate and consult with researchers to plan and design scalable software solutions to meet research-related computational needs.
Demonstrate strong programming and statistical proficiency with proven expertise in genomics analyses and advanced knowledge of computational infrastructure.
Analyze data for grant and manuscript submissions.
Assist with publications.
Ensure compliance with data security and IRB rules.
Prepare an Annual Report.
Other related duties as assigned.
Minimum Qualifications
Bachelor's degree or equivalent in education and experience; M.S. or PhD. in Bioinformatics, Computer Sciences, Statistics, or related discipline strongly preferred.
Minimum of 6+ years of software engineering experience including at least 4+ years of developing DNA sequencing statistical analysis tools or architecting/implementing large data-driven software tools used in high-volume operations.
Preferred Qualifications
Proven expertise in building/implementing populations tools widely used by the genetics community.
Demonstrated experience with next-generation sequencing analysis.
Extensive knowledge of Java, C++, or other OO-based programming languages, as well as Perl, Python, R, Bash, or other functional scripting languages.
Experience designing relational database schemas and query optimization.
Good organization and communication skills, with demonstrated ability to productively work as a member of a team. Strong verbal and written communication skills are required. Ability to work in a fast-paced, collaborative team environment.
Other Requirements
Successful completion of applicable compliance and systems training requirements
Equal Opportunity Employer / Disability / Veteran
Columbia University is committed to the hiring of qualified local residents.
Columbia University is one of the world's most important centers of research and at the same time a distinctive and distinguished learning environment for undergraduates and graduate students in many scholarly and professional fields. The University recognizes the importance of its location in New York City and seeks to link its research and teaching to the vast resources of a great metropolis. It seeks to attract a diverse and international faculty and student body, to support research and teaching on global issues, and to create academic relationships with many countries and regions. It expects all areas of the university to advance knowledge and learning at the highest level and to convey the products of its efforts to the world.