10 terms for efficient AI use

Since the generative AI surged into mainstream popularity at the end of 2022, most people have gained a basic understanding of how to interact with computers more easily using generative AI and natural language. Terms like “prompt” or “machine learning” have now become casual conversation topics shared over coffee with friends. (If you’re still unfamiliar with, consider reading an introductory article titled “10 AI Terms Everyone Should Know.“) However, as AI continues to evolve, so does the terminology. Do you know the difference between a Large Language Model (LLM) and a Small Language Model (SLM)? Do you know what “GPT” in ChatGPT stands for? Or how RAG is related to sorting out fiction? I’ll help you understand the latest AI terms.


Computers using AI can now understand information, solve problems, and perform tasks by leveraging patterns learned from historical data. The most advanced systems have progressed to the point where they can plan a series of actions to achieve goals, thereby solving increasingly complex problems. For instance, if you ask for help planning a theme park trip, an AI system can break down the schedule step-by-step to achieve goals like riding six different attractions and engaging in water activities during the hottest part of the day. Using reasoning, it ensures not to double back unnecessarily and schedules water slide activities between noon and 3 PM.


Creating and using AI systems involves two stages: training and inference. Training is akin to educating the AI system, where it is provided with datasets and learns how to perform tasks or make predictions based on that data. For example, it might be given a list of recently sold house prices along with various variables like the number of bedrooms and bathrooms. During training, the system adjusts its internal parameters, which are the values that determine how much weight to give each factor affecting the price. Inference is the stage where the system uses the learned patterns and parameters to predict the price of a new house entering the market.

SLM(Small Language Models)

A Small Language Model (SLM) is a smaller version of a Large Language Model (LLM). Both models use machine learning techniques to recognize patterns and relationships and generate realistic and natural language responses. While LLMs are very large and require significant computational power and memory, SLMs, such as Phi-3, are trained on smaller datasets and have fewer parameters, making them more compact and usable without an internet connection. This makes them suitable for answering simple questions on devices like laptops or mobile phones. For example, an SLM can handle basic questions about pet care but is not designed for detailed, multi-step reasoning like training a visually impaired dog.


Generative AI systems can create stories, poems, jokes, and answer research questions. However, they sometimes struggle to distinguish between fact and fiction or provide inaccurate responses due to outdated training data, a phenomenon known as hallucination. To improve accuracy and ensure contextually appropriate and personalized outcomes, developers use a process called “grounding.” Grounding involves anchoring the model to real-world data and concrete examples, enhancing the system’s ability to interact accurately with reality.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a method that allows developers to access grounding sources to provide more accurate and up-to-date information to AI systems. RAG patterns enable AI programs to incorporate additional knowledge without retraining, thus saving time and resources. For example, if you’re running a clothing company and want to create a chatbot that can answer questions about your products, you can use RAG patterns to help customers find the green sweater they’re looking for based on your product catalog, without needing to retrain the AI model.


When processing user requests, AI programs often need to perform numerous tasks. The orchestration layer guides all these tasks in the correct sequence to obtain the optimal response. For instance, if you ask Microsoft Copilot about Ada Lovelace and then follow up by asking when she was born, the AI’s orchestrator recognizes that the subsequent question’s “she” refers to Lovelace. The orchestration layer also follows RAG patterns, enabling it to search the internet for new information and incorporate it into the context, helping the model find better answers. It’s akin to a maestro signaling to the violin, then the flute and oboe, all following the score to create the sound the composer had in mind.


Contemporary AI models technically lack memory. However, AI programs can temporarily store previous questions and answers at certain stages and incorporate that context into current requests to “remember” information. For example, they can use data from RAG patterns to provide up-to-date information. Developers are experimenting with orchestration layers to determine whether it’s useful for AI systems to remember things temporarily, akin to Post-it notes, or to retain information permanently for longer periods.

Transformer Models and Diffusion Models

For decades, efforts to teach AI systems to understand and generate language have been underway, but one breakthrough that has recently accelerated progress is the Transformer model. Among generative AI models, Transformers excel in understanding context and nuances while processing information rapidly. They carefully examine patterns in data, assess the importance of various inputs, predict what will come next, and generate text accordingly. A prominent example of a Transformer is the “T” in ChatGPT, the Generative Pre-Trained Transformer designed for text generation. Diffusion models, on the other hand, are primarily used for image generation. They gradually disperse pixels from random locations to form the requested image. Diffusion models create the desired image by iterating small changes over time.

Frontier Model

Frontier models are large-scale systems that expand the boundaries of AI and can perform a variety of tasks. These models sometimes demonstrate unexpected capabilities. Several technology companies, including Microsoft, have established Frontier Model Forums to share knowledge, set safety standards, and ensure the safe and responsible development of powerful AI programs.


A GPU, short for Graphics Processing Unit, is essentially a turbocharged calculator. Originally designed to smoothly render the elaborate graphics of video games, it now plays the role of a high-performance engine in computing. GPUs tackle mathematical problems by parallel processing through numerous small cores, circuits, and transistor networks. Since AI involves solving massive computations to communicate in human language and recognize images or sounds, GPUs are essential for training and inference tasks in AI tools. Today’s most advanced models are trained using large clusters composed of thousands of GPUs. Data centers like Microsoft’s Azure house some of the world’s most powerful computers.

If you need Consulting about Data & AI based on Cloud, Please contact Cloocus.
Secured By miniOrange