Bioinformatics and Next Generation Sequencing

Objectives: bioinformaticians often manage large amounts of data, they perform calculations involving whole genomes, they want to have reproducible results, they manage workflows and environments developed and used by many users.

Why: there are several reasons:

  • centralize data reliably and easily,
  • reuse the analysis workflows after many years, and replicate the same results (or, dually, perform new analysis on old data),
  • do calculations that require large amounts of memory and of time.

How: designing and implementing databases with solid, tested environments such as PostgreSQL. Creating Conda-based work environments and using source code management systems  like Git and Gitlab. Analyzing and evaluating the software requirements to find the best strategy between data parallelism or task parallelism.