Did you know that you can navigate the posts by swiping left and right?

Data Science: The Past, The Present, and The Future

05 Sep 2020 . category: science . Comments
#data-science #AI #machine-learning #NLP

A glimpse at the future

Charlie’s typical day is not special in the least as it is just a mirror of pretty much the life of everyone of his peers. He goes to work in the morning virtually, using some sort of advanced VR technology, has his lunch made by some kitchen specialized androids and has a humanoid for a personal assistant who does pretty almost anything. Needless to say that this personal assistant is indistinguishable from a human both physically and intellectually.

The paragraph above may seem like an extract from some science fiction book, but it is a possible future and one which I strongly believe in. This future will be made possible by the current surge in research and development in the fields of AI, powered by Deep Learning and other Data Science techniques.

History and Trend

Though these techniques have gained popularity in the last two decades, some of them have been around for several decades now. However, the relatively small computing power around at that time could not allow for serious work and experimentation on the field. A few people, including [Press, Gill] have stated that John W. Tukey, the author of the 1977 popular book Exploratory Data Analysis, was one of the major figures who kick-started work in the field, though the name Data Science itself was only coined in 2001.

Following Moore’s law, computing power has tremendously increased, giving way to more practical work in the field. Another major factor that has spearheaded the growth is the exponential growth in the amount of data available. [Desjardins, Jeff] reports that as at 2013, there was about 4.4 zettabytes (ZB ~1,000,000,000 terabytes) of data accumulated in the digital world and this figure is expected to reach 44ZB in 2020, showing a very speedy growth. As a data-driven field, Data Science has benefited immensely from this huge amount of data.

Developments and Perspectives

I classify the developments in Data Science into 3 different categories: algorithms, hardware and tools. With lots of research in the area, there has been an influx of algorithms and techniques. Among the popular ones being Reinforcement Learning, Generative Adversarial Networks (GANs), Genetic algorithms, Transfer Learning, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks. These techniques have been applied to different areas such as Gaming, Natural Language Processing, Pattern Recognition, etc. Very good examples of ways in which these techniques have been used to achieve tremendous results include AlphaGo, Deep Mind, GPT3 and the recent surge in self-driving cars. With even better improvements in these techniques, more useful applications await in the nearest future.

As already mentioned, the lack of computing power hindered much work in the area but that is no more. Gradually, CPUs became more powerful and even later on, developers and researchers started using GPUs to train their ML models. This was made possible by Nvidia, the GPU manufacturer when it released the CUDA computing platform. GPUs which were traditionally used only for graphics rendering now became platforms for running ML algorithms. Years later, Google released the Tensor Processing Units (TPUs) which is particularly specialized at training neural networks. This was a huge achievement, as it made training models much faster and allowed for training with bigger datasets. Currently, many cloud platforms have special environments for data scientists to train and deploy their models.

Regarding tools, there has been a surge in the number available, and many of them are free, making it easier for people to enter the field. These tools provide the framework to easily train and deploy ML models without having to implement the underlying algorithms. Some popular open source frameworks available include TensorFlow, PyTorch, Keras, Microsoft CNTK etc. Some advocates for free and open AI have also created frameworks that have state-of-the-art techniques implemented, to ensure that they are more accessible and make them less likely to be monopolized or even weaponized. A very good example is the OpenAI Gym. There are also several free courses currently available and cloud solutions that make for easier deployments.

Natural Language Processing

I am especially interested and excited about Natural Language Processing (NLP) techniques. A language is not just a collection of words and grammar rules, a language embodies culture, traditions and lifestyle and the closer computers get to learning and using natural languages, the closer they get to us humans. Currently, techniques exist that enable computers understand the syntax, semantics and context of natural languages using existing corpora like Twitter, WordNet, Reuters News etc. These techniques are applied to achieve great results in such areas as machine translation, text summarization, sentiment analysis, etc. My bachelor’s thesis was centered on automatic text summarization as applied to a collection of related tweets. Text summarization seeks to find the most important pieces of information in a text in such a way that the summary can fully represent the entire text.

NLP techniques have gradually developed from rule based approaches to more data-driven statistical approaches using deep learning algorithms. This has been key to the recent successes in NLP systems. Virtual assistants like Siri, Google Assistant and Cortana have NLP algorithms at their core and have seen quite good successes. These are the systems that will eventually evolve into the humanoid personal assistant in Charlie’s age in the future. Chat bots have also become very common thanks to the development in this area. The capabilities of GPT3 are also absolutely mind-blowing!

Generally, NLP tries to improve human-computer interaction through the use of natural languages which is easier for humans to understand. I think this is really one of the biggest pieces in the puzzle of creating an Artificial General Intelligence (AGI).

Are you also excited about what the future holds for Data Science and AI?

References

[Press, Gill] “A Very Short History Of Data Science” [Online]. Available: https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data -science/#2551679155cf. [Accessed: 5-May-2019].

[Desjardins, Jeff] “How Much Data is Generated Each Day?” [Online]. Available: https://www.visualcapitalist.com/how-much-data-is-generated-each-day/. [Accessed: 5-May-2019].

[Christian, Hans] “The Future of Data Science” [Online]. Available: https://towardsdatascience.com/the-future-of-data-science-14653afb52f5. [Accessed: 5-May-2019].

[Raval, Siraj], “Natural Language Processing” [Youtube], Available: https://www.youtube.com/watch?v=bDxFvr1gpSU. [Accessed: 5-May-2019].

Kenneth Nwafor is a data scientist and software developer with great experience in the tech industry. He loves to write about tech, science and life in general.