Recent Posts



Your most stubborn enemy as a Data Scientist.

You are your most stubborn enemy as a Data Scientist! The tricks that your brain, your most powerful tool, pull on you with the help of your own ego!

Data scientists aspire to be objective and let the data speak for itself, but it is very easy to become too attached to the beautiful and intricate model that YOU have created. This subconscious attachment make you blind to some of the nuances in your result/data that someone with the domain knowledge and expertise can easily raise their eye brow and point to those pesky little details that disfigure your beautiful model.

You are NOT your model.

Getting attached to your model, or worse equating yourself with your model/work leads you to the fairyland of denial and illusion. It also makes the criticism from other team members and stack holders hurt. You are bound to suffer because reality is very mechanical and does not care about you or your sophisticated, yet flawed, model. It does what it wants (freewill?) and the reality will burst your bubble. And these bubbles can be very costly in real-world.

We Don't Have Enough Data is AS BAD AS Having a Crystal Ball!

Most data scientists suffer from the assumption that the less tech-savvy business leaders or sales team have about them. The business leaders are, mistakenly, under the impression that a data scientist owns a crystal ball and can clearly see the future. Forecasting is a lost battle particularly if you don't have enough historical data. So it makes your life a lot easier to make this clear before hand.

At the same time, with the rise of deep learning algorithms that can benefit from millions and millions of training data, some data scientists have become too lazy to dig into the meaning of each data point, hence abandoning the "SCIENTIST" part of their title. They often answer to business requests by saying "there is not enough data!". But the question that every data scientist should ask themselves before using the "not enough data" phrase is have I tried to squeeze all the possible information from each data point before giving up? They should interrogate every data point as if their lives depend on it!

This is what a physicist do! They try to learn from each experiment while determining the confidence level of their insight.