The Evolution of Artificial Intelligence (AI)

Content

Artificial intelligence (AI) has come a long way since its inception, evolving from rule-based systems to sophisticated machine learning and deep learning models. Today, AI, especially in computer vision, wouldn’t exist without the foundational work that started in the 1950s with rule-based computer vision.

In this blog post, we’ll take you on a brief journey through time that covers some major milestones in AI. We’ll start in the 1950s and work our way to the present day, touching on how each era impacted the next.

1950-1960 rule-based computer vision

Rule-based computer vision refers to early techniques in the field that relied on manually crafted algorithms and heuristics to interpret and analyse visual data. Rules-based AI has played a crucial role in the history of artificial intelligence and computer vision. From the early days of symbolic reasoning and expert systems in the 1950s and 1970s to the sophisticated image-processing techniques developed in the following decades, rules-based methods laid the foundation for many modern AI applications. The following features characterise these methods:

  • Explicit Rules and Heuristics: Rule-based systems build on predefined rules created by domain experts. These rules are based on images’ geometric and statistical properties, such as edges, shapes, textures, and colours.

  • Feature Extraction: Key features of the images are extracted using techniques like edge detection (e.g., Sobel, Canny), corner detection (e.g., Harris corner detector), and blob detection. These features are then used to identify and classify objects within the image.

  • Image Processing Techniques: Various image processing operations enhance and manipulate images. These include filtering (e.g., Gaussian blur, median filter), thresholding, morphological operations (e.g., dilation, erosion), and transformations (e.g., Hough transform for detecting lines).

  • Geometric Methods: Techniques such as template matching, contour detection, and Hough transforms to detect and recognise image shapes and patterns.

  • Model-Based Approaches: Predefined models of objects or scenes are used to match and recognise them in images. These models can be simple geometric shapes or more complex representations.

  • Pattern Recognition: Simple statistical methods, such as k-nearest neighbours (k-NN) or decision trees, classify tasks based on extracted features.

The development of rules-based computer vision started in the 1950s with researchers such as Alan Turing and John McCarthy laying the groundwork for AI by exploring the potential of machines to simulate human intelligence.

1970 – 1990 expert and early computer vision systems

In the 1950s-1960s, AI research focused on symbolic reasoning and logic. A decade later, in the 1970s, the formalisation of rules-based systems began with the development of expert systems. These systems used a set of rules encoded by human experts to make decisions and solve problems in specific domains. Early computer vision systems also emerged in the 1970s, using basic image processing techniques and heuristics to interpret visual data. Expansion and application followed in the 1980s with knowledge engineering and commercial use.

Knowledge engineering became a field focused on extracting and encoding expert knowledge into rules. Because expert knowledge could be captured in rules, various industries began using expert systems, including medicine, finance, and manufacturing.

Advancements and limitations followed expansion and application in the 1990s. With increased sophistication, rules-based AI systems started incorporating more complex rules and handled uncertainty better. However, despite advancements, it became evident that rules-based systems had difficulty handling the variability and complexity of real-world data.

Advantages of Rule-Based Computer Vision

  • Simplicity and Interpretability: The algorithms are relatively simple to understand and implement, and the results are easy to interpret because the rules are explicitly defined.

  • Deterministic: The system’s behaviour is predictable and repeatable, given the same input.

Limitations of Rule-Based Computer Vision

  • Limited Flexibility: Rule-based systems often struggle with variability in real-world images, such as changes in lighting, occlusion, and distortions.

  • Scalability: Creating rules for complex and diverse image recognition tasks is challenging and time-consuming.

  • Performance: These methods often have lower accuracy and robustness than modern machine learning-based approaches, especially for complex object detection and recognition tasks.

2000 – 2010 machine learning and deep learning

The transition into machine learning began in the 2000s with a shift towards data-driven approaches. These methods, particularly deep learning, offered better performance for many AI tasks, including computer vision. While machine learning dates back to the 1980s with decision trees and backpropagation algorithms, its explosion was fuelled by digital data (big data) and advancements in computational power, which introduced data-driven machine-learning approaches.

Machine learning focuses on developing algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data. Instead of being explicitly programmed for specific tasks, machine learning systems enhance their performance by identifying patterns and relationships within large datasets. Here’s a detailed overview of what machine learning is, its types, key concepts, and applications:

Supervised Learning: The model is trained on labelled data, meaning each training example is paired with an output label. The goal is to learn a mapping from inputs to outputs.

  • Classification: Predicting discrete labels

    • Regression: Predicting continuous values

Unsupervised Learning: The model is trained on unlabelled data, meaning the system tries to learn patterns and the data structure without explicit instructions on what to predict.

  • Clustering: Grouping similar data points together

    • Dimensionality Reduction: Reducing the number of variables (features) under consideration to only select the most informative ones.

  • Semi-Supervised Learning: A combination of supervised and unsupervised learning. The model is trained on a small amount of labelled data supplemented with a large amount of unlabelled data.

  • Reinforcement Learning: The model learns by interacting with an environment, receiving feedback through rewards or punishments, and aiming to maximise cumulative rewards.

With the rise of machine learning, rules-based systems became less dominant. Still, they remained with hybrid approaches combining rules-based methods with machine learning, leveraging the strengths of both paradigms.

As it pertains to computer vision — a field that enables computers to interpret and understand visual information through algorithms and techniques for tasks such as image classification, object detection, and image segmentation — machine learning has revolutionised it, enabling many advancements with crucial contributions in the following:

  • Automatic feature extraction: Traditional computer vision methods required manual feature extraction, which was time-consuming and limited by human expertise. Machine learning automates this process by learning the optimal features directly from raw data.

  • Scalability: Machine learning models, particularly deep neural networks, can handle large and complex datasets, allowing for scalability in analysing vast amounts of visual data.

  • Accuracy and performance: Machine learning algorithms, particularly deep learning models like convolutional neural networks (CNNs), have significantly improved the accuracy and performance of computer vision tasks, achieving near-human or even superhuman capabilities in certain areas.

Computer vision machine learning (CVML) represents the intersection of computer vision and machine learning, leveraging the strengths of both fields to advance visual data analysis capabilities through a series of techniques, one of which is convolutional neural networks (CNNs).

CNNs are a class of deep learning algorithms known for their effectiveness in image and video recognition, classification, and segmentation tasks. They automatically learn spatial hierarchies of features.

2012 – 2024 CNNs, multimodal foundation models, and visual question-answering

In 2012, CNNs gained significant popularity, primarily due to AlexNet’s breakthrough performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This breakthrough, combined with the availability of large datasets and advances in computational power (particularly GPUs), catalysed a wave of research and industry adoption. Subsequent innovations in network architectures and the proven effectiveness of CNNs in various applications cemented their place as a cornerstone of modern computer vision and future advancements, including widely popular multimodal foundation models and visual question answering (VQA). Foundation models, as of 2024, are primarily based on pre-trained transformer architectures (“transformers”). Transformers first gained widespread popularity and use in the language domain (natural language processing and understanding: chatbots, document summarization, etc.). However, they were soon adapted to address problems that combine language and images or video. Their key appeal lies in the ability to be trained on and effectively learn from vast volumes of general-knowledge data; hence “pre-trained”.

Multimodal foundation models are advanced AI systems that simultaneously process and understand multiple data types, such as text, images, audio, and video. Unlike traditional models specialising in a single data type, these models integrate various data sources to provide more comprehensive and nuanced outputs. Understanding and processing different data types simultaneously enhances task performance, which requires cross-modal understanding.

Detection is a computer vision task involving finding objects in an image or video. A detector network may find and mark objects of one, several, or many classes and specify the class and the location of every object found in an image or video. CNN-based detector networks are usually pre-trained to detect a specific class or class of objects and do not have language capabilities. On the other hand, transformer-based detector networks are language-capable and thus can be prompted (instructed) to detect various object classes. The transformer networks thus offer a much higher flexibility at the expense of a much higher computational cost.

Few-shot learning is the capability of a model to learn from a few examples without the need to collect and curate an extensive data set. Transformer-based models have an advantage over CNNs in this domain.

VQA (visual question answering) involves answering questions about images, requiring an understanding of visual and semantic content. Many (but not all) transformer-based models have this capability.

The synergy between CNNs and newer architectures leads to powerful, unified models capable of complex, cross-modal understanding and reasoning, paving the way for generative AI and more capable AI models.

Model distillation (or, more generally, knowledge distillation) is one way to leverage this synergy by “distilling” the foundation model knowledge on a specific task into a smaller task-specific model, such as a CNN, which can perform this task at a much lower computational cost.

Generative AI, particularly in the context of multimodal foundation models and VQA, leverages the advancements in CNNs, multimodal models, and transformers to create, understand, and interact with various data types. CNNs provide the necessary feature extraction capabilities, multimodal models integrate diverse data types for comprehensive understanding and generation, and VQA models leverage these technologies to create interactive and intelligent systems. This integration enables generative AI to produce realistic and contextually accurate synthetic data and content across various applications, driving innovation in creative arts, healthcare, and lots more.

Conclusion

From the beginnings of rule-based computer vision in the mid-20th century to the groundbreaking developments of machine learning and deep learning, AI has seen exponential growth. Today, we stand on the shoulders of these early innovations, leveraging advanced techniques like CNNs and transformer-based models to achieve feats once thought impossible. As we continue to push the boundaries of what AI can do, the synergy between traditional methods and cutting-edge technologies promises even greater advancements.

At Aicadium, we focus on finding viable end-to-end solutions for our clients. This encompasses exploring all tech options, new and old, to provide the right combination that suits specific needs. We understand AI’s rich history and constantly match traditional methods with today’s advancements, contributing to a future that forges intelligent systems that enhance every aspect of our lives, driving innovation and solving complex challenges across various domains.

Discover what AI can do for your business

New call-to-action

Recommended articles