Data in the Age of AI

Data in the Age of AI

ChatGPT’s developer, OpenAI , recently launched its latest version, codenamed Strawberry, featuring advanced reasoning capabilities. However, their next flagship model may not show significant improvements over its predecessors due to a lack of fresh data for ongoing training.

Data is the new gold. Data is the new oil. You may have heard variations of these expressions… The reality is that the AI revolution wouldn’t be possible without data, and data wouldn’t hold the value it has today without AI. The strength of GenAI relies on data, and a great generative AI solution requires high-quality data. But what exactly is data?

Data can be defined as raw facts, figures, or symbols that represent observations, measurements, or collected information. The field of artificial intelligence has been studying data at scale, learning the meaning of different data types and how to represent information in various modalities, such as text, image, video, audio, and 3D. Generative AI tools have been developed to transform data from one modality to another. For example:

  • Translation: from English-language text to Spanish-language text
  • Image generation: from words (text) to pixels (images)
  • Video generation: from words (text) to video
  • Captioning: from images/videos to words (text)

Structured collections of data, known as datasets, are created to enable analysis, train models, or test hypotheses in data science. Extremely large and complex datasets are referred to as “Big Data”. Big Data is used in AI to generate insights due to its capacity to capture, store, and process large volumes, high velocity, and variety of data (the three “V”s).

Big Data has effectively combined with two additional innovations: advanced AI algorithms and models for data analysis, and cloud computing (scalable infrastructure to store, process, and access data) paving the way for the AI revolution.

Empirical observations suggest that the more data you have to train an AI model, the better the model’s performance. Additionally, the larger the model you want to train, the more data it will require. However, foundational models have already been trained on data available in the public domain, raising the question: what new strategies need to be developed to prevent AI improvement slowdown? Two strategies are already in motion:

  1. Licensing proprietary content from enterprises and publishers who own “fresh” data, such as movie studios or TV networks.
  2. Using synthetic data generated through simulations, generative models, or data augmentation techniques to create new datasets that mimic real-world scenarios.

Data science is one of the most fascinating fields of study in the age of AI, where humans make a meaningful impact on the AI ecosystem. I’ll be writing more about this in the upcoming editions of #EverydAI #BetterAI for a #BetterWorld VxM AI


To view or add a comment, sign in

More articles by Veronika Moroian

  • Programming in the Age of AI

    Programming in the Age of AI

    This week, the computer world lost one of its pioneers: Professor Thomas E. Kurtz, who co-developed the BASIC…

  • Responsible AI

    Responsible AI

    Responsible AI is one of the most important issues for the future of humanity. I’ve often heard variations of the…

    3 Comments
  • Predictions for the Future of AI

    Predictions for the Future of AI

    Last week, during the World Summit AI in Amsterdam, experts from the leading companies in the AI field shared their…

    1 Comment
  • AI and the Future of Broadcasting

    AI and the Future of Broadcasting

    Artificial intelligence is making an impact in every industry, and this trend will continue to grow. While some…

    1 Comment
  • Your Personal Data Privacy in the Age of AI

    Your Personal Data Privacy in the Age of AI

    When social media apps became popular and were massively adopted, the artificial intelligence field was still in…

    1 Comment
  • Math with or without AI?

    Math with or without AI?

    Last week, OpenAI released its new AI model, ChatGPT o1, also known as Strawberry. It is available in two versions…

    2 Comments
  • The AI Revolution in the Entertainment Industry

    The AI Revolution in the Entertainment Industry

    The AI Revolution is already impacting the entertainment industry. While Hollywood content creators and artists have…

    6 Comments
  • AI Legislation

    AI Legislation

    The EU Artificial Intelligence Act officially entered into force on August 1, 2024, and became the first ever…

    6 Comments
  • New AI Jobs

    New AI Jobs

    This week, a friend told me “I don’t know what I don’t know about Artificial Intelligence”, which made me think about…

    7 Comments
  • AI or not AI, does it really matter?

    AI or not AI, does it really matter?

    We live in a globally connected world, where half of the population can speak more than one language. In this week’s…

Explore topics