I have been involved in multidisciplinary projects in human genetics for a number of years, since the end of my PhD. As a data scientist and scientific computing expert, I have been working with people with my own expertise and, even more often with those who have almost no computing skills. For the record, people who can happily nest a number of for loops to crunch datasets of several gigabytes and who can keep demanding for new hardware whenever they realize their programs are taking ages to complete.
In almost all the scenarios I have been involved, there is The Guy who has a hypothesis, and a bunch of other guys arguing about it, demolishing it if needed, and writing the code that eventually places that hypothesis within real world or dreamland.
A number of steps are usually required during this process:
1) a brief description of the problem – that, if done correctly, solves half of it – followed by 2) an explanation of the data at hand. This usually suggests the initial steps of the analytic pipeline, namely data cleansing, wrangling and normalization.
The real analytic phase follows the initial one with a discussion of the strategy to represent the data and the specific algorithm to solve the problem. This can be very specific to the research questions one tries to answer, and allows to choose between clustering or decision trees or neural networks or SVMs or any other algorithm from the machine learning shelf.
One common part to any group I have been working with, regardless of the problem to solve and the algorithm to use, is The Guy’s feedback.
Ideally, data scientists, data engineers and developers work in a highly connected environment in which they give feedback to each other and take into consideration any correction, very promptly.
However, in many realistic cases, The Guy is the only actor in charge of determining the pace of the analysis, giving feedback to the rest of the team, sometimes with a delay.
This hypothetical (un)structured team is doomed to fail, giving rise to code that must be reviewed very frequently, that is not engineered, quite unreadable, stuffed with bugs and definitely not well structured.
Some people refer to all this as agile programming, and apparently they look very happy about it.
I have experienced something more interesting, that I call acrobatic programming. And frankly, I am not happy at all.