Data in the Age of AI
ChatGPTâs developer, OpenAI , recently launched its latest version, codenamed Strawberry, featuring advanced reasoning capabilities. However, their next flagship model may not show significant improvements over its predecessors due to a lack of fresh data for ongoing training.
Data is the new gold. Data is the new oil. You may have heard variations of these expressions⦠The reality is that the AI revolution wouldnât be possible without data, and data wouldnât hold the value it has today without AI. The strength of GenAI relies on data, and a great generative AI solution requires high-quality data. But what exactly is data?
Data can be defined as raw facts, figures, or symbols that represent observations, measurements, or collected information. The field of artificial intelligence has been studying data at scale, learning the meaning of different data types and how to represent information in various modalities, such as text, image, video, audio, and 3D. Generative AI tools have been developed to transform data from one modality to another. For example:
Structured collections of data, known as datasets, are created to enable analysis, train models, or test hypotheses in data science. Extremely large and complex datasets are referred to as âBig Dataâ. Big Data is used in AI to generate insights due to its capacity to capture, store, and process large volumes, high velocity, and variety of data (the three âVâs).
Big Data has effectively combined with two additional innovations: advanced AI algorithms and models for data analysis, and cloud computing (scalable infrastructure to store, process, and access data) paving the way for the AI revolution.
Empirical observations suggest that the more data you have to train an AI model, the better the modelâs performance. Additionally, the larger the model you want to train, the more data it will require. However, foundational models have already been trained on data available in the public domain, raising the question: what new strategies need to be developed to prevent AI improvement slowdown? Two strategies are already in motion:
Data science is one of the most fascinating fields of study in the age of AI, where humans make a meaningful impact on the AI ecosystem. Iâll be writing more about this in the upcoming editions of #EverydAI #BetterAI for a #BetterWorld VxM AI