ایک ہی حادثہ تو ہے اور وہ یہ کہ آج تک بات نہیں کہی گئی ، بات نہیں سنی گئی جون ایلیا ~ The only catastrophe till now has been, that; Nothing is said, nothing is heard ~ Jaun Elia
Imagine giving a musical instrument to someone who doesn’t know anything about music and asking them to produce a melody. Imagine giving a paint brush to someone who doesn’t know anything about art and asking them to paint a beautiful painting. Why does this all sound absurd? Because pen and musical instruments are just these; Instruments. Music and art comes from somewhere else. Just like these modules and libraries like SPSS, R, Python, Matlab, Tensorflow, Keras, SKlearn etc. are instruments. Great applications of ML and Data Science come from somewhere else too. They come from knowledge of statistics.
Images taken of of Mars between 20 years highlight the way human mind can convolve or over-fit on information.
Unfortunately, statistics is not as intuitive as music or art. This is because from a psychological perspective, our brain is only designed to think in terms of discreet deterministic scenarios. Possibly, we got this from our cavemen days, when rolling of dice to calculate if the predator will attack us or not wouldn’t have been a really cool thing to do. I can assure you that the guys who did this would have become easy dinner to some beast. But then the world changed and so did the beasts. The beasts of today are chronic diseases, healthcare systems, economic systems etc. Things that require us to think in terms of large numbers, probabilities and distributions aka statistics. Statistics is one of the most un-intuitive and abstract branches of mathematics, if not the most abstract. It took 4 centuries to build the theory and even the greatest of the great mathematicians made mistakes with it. Artificial Intelligence and Data Science in general are new names to ages old theory of statistics. Almost none of this is new. What is new, is actually the optimization techniques of functions. For example, several algorithms of AI. But what functions do we have to optimize? And what are the underlying assumptions of those functions? This all comes from statistics. And the huge area of testing, which is ignored by almost everyone today and which quantifies how will the model fare in the real world is fully built in statistics. Majority of the practitioners of AI and DS today, don’t understand the underlying assumptions of models. They don’t understand the maxim, which a great statistician E.P. Box, so eloquently put; “All models are wrong but some are useful”. This means that models(functions) exist in a mathematical world. They don’t necessarily just approximate the real world. There are sacred rules and rituals that have to be observed in order to ensure the meaningful transit of models from an abstract mathematical world to the world of us mortals. Failing to abide by these underlying rules/assumptions has had devastating consequences for us throughout history and if we are not prudent enough, even, potentially, more so in future.
John Ioannidis published his famous paper; “Why Most Published Research Findings Are False” in 2005
Despite all this, today most practitioners of AI and DS, argue that knowing of instruments and ‘common sense’ is enough for most routine analytics. They are, in a sense, not wrong. But the line between routine and devastation by over generalization is exceptionally thin and invisible to those with little or no knowledge of stats. In agreement with an age old adage that, ‘those who don’t read history are bound to repeat it’, let me give you an example from a recent history. We’ve all heard of SPSS. It is a statistical tool designed primarily for social and biomedical scientists. It was designed on the principles that you could perform analysis just by the push of a button. And this is what the scientists did; Pushed buttons and produced research. Later from several systemic reviews it was discovered that most of biomedical research is irreducible or wrong (There more factors to it; See John Ioannidis and his work on this). This is biomedical research or in other words; Matters of life and death! The great market crash of 2008 was also one of the examples of over generalization on what then seemed common sense distribution of risk as trenches in the mortgage backed securities. I can quote many example here, but I think you guys got the picture.
If you want to go deep in AI/DS, (by deep I mean more than merely scratching the surface) invest your time in learning statistics. This is that one skill which you will thank yourself for heaving learned, especially taking into consideration the direction the future is taking. I taught MATLAB at a university and my father taught FORTRAN. Today, hardly anyone has even heard of FORTRAN. But both me and my father learned Numerical Analysis(philosophy underlying these 2 great languages) from the copy of the same book my father bought for himself when he was young. So in a nutshell, tools are just tools and they will keep on changing and you can always learn them and even learn them better once you know what their purpose actually is. What is of essence is that you know how to make music.
Comments